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ABSTRACT 

A  hyperspectral  unmixing  algorithm  that  %ds  multiple  sets  of  endmembers  is  proposed. 

The  algorithm,  called  Context  Dependent  Spectral  Unmixing  (CDSU),  is  a  local  approach  that 
adapts  the  unmixing  to  dicT erent  regions  of  the  spectral  space.  It  is  based  on  a  novel  function  that 
combines  context  idcntCcation  and  unmixing.  This  joint  objective  function  models  contexts  as 
compact  clusters  and  uses  the  linear  mixing  model  as  the  basis  for  unmixing. 

Several  variations  of  the  CDSU,  that  provide  additional  desirable  features,  are  also  proposed. 

First,  the  Context  Dependent  Spectral  unmixing  using  the  Mahalanobis  Distance  (CDSUM)  oo'crs  the 
advantage  of  identifying  non-spherical  clusters  in  the  high  dimensional  spectral  space.  Second,  the 
Cluster  and  Proportion  Constrained  Multi-Model  Unmixing  (CC-MMU  and  PC-MMU)  algorithms 
use  partial  supervision  infonnation,  in  the  form  of  cluster  or  proportion  constraints,  to  guide  the 
search  process  and  narrow  the  space  of  possible  solutions.  The  supervision  information  could  be 
provided  by  an  expert,  generated  by  analyzing  the  consensus  of  multiple  unmixing  algorithms,  or 
extracted  from  co-located  data  from  a  different  sensor.  Third,  the  Robust  Context  Dependent 
Spectral  Unmixing  (RCDSU)  introduces  possibilistic  memberships  into  the  objective  function  to 
reduce  the  ef  ect  of  noise  and  outliers  in  the  data.  Finally,  the  Unsupervised  Robust  Context 
Dependent  Spectral  Unmixing  (U-RCDSU)  algorithm  learns  the  optimal  number  of  contexts  in  an 
unsupervised  way.  The  performance  of  each  algorithm  is  evaluated  using  synthetic  and  real  data. 

We  show  that  the  proposed  methods  can  identify  meaningful  and  coherent  contexts,  and  appropriate 
endmembers  within  each  context. 

The  second  main  contribution  of  this  thesis  is  consensus  unmixing.  This  approach  exploits 

the  diversity  and  similarity  of  the  large  number  of  existing  unmixing  algorithms  to  identify  an  accurate  and  consistent 

set  of  endmembers  in  the  data.  We  run  multiple  unmixing  algorithms  using 

die? erent  parameters,  and  combine  the  resulting  unmixing  ensemble  using  consensus  analysis.  The 

extracted  endmembers  will  be  the  ones  that  have  a  consensus  among  the  multiple  runs. 

The  third  main  contribution  consists  of  developing  subpixel  target  detectors  that  rely  on 
the  proposed  CDSU  algorithms  to  adapt  target  detection  algorithms  to  different  contexts.  A  local 
detection  statistic  is  computed  for  each  context  and  then  all  scores  are  combined  to  yield  a  %ial 
detection  score.  The  context  dependent  unmixing  provides  a  better  background  description  and 
limits  target  leakage,  which  are  two  essential  properties  for  target  detection  algorithms. 
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ABSTRACT 


CONTEXT  DEPENDENT  SPECTRAL  UNMIXING 


Hamdi  Jenzri 


August  11,  2014 


A  hyperspectral  unmixing  algorithm  that  finds  multiple  sets  of  endmembers  is  proposed. 
The  algorithm,  called  Context  Dependent  Spectral  Unmixing  (CDSU),  is  a  local  approach  that 
adapts  the  unmixing  to  different  regions  of  the  spectral  space.  It  is  based  on  a  novel  function  that 
combines  context  identification  and  unmixing.  This  joint  objective  function  models  contexts  as 
compact  clusters  and  uses  the  linear  mixing  model  as  the  basis  for  unmixing. 

Several  variations  of  the  CDSU,  that  provide  additional  desirable  features,  are  also  proposed. 
First,  the  Context  Dependent  Spectral  unmixing  using  the  Mahalanobis  Distance  (CDSUm)  offers  the 
advantage  of  identifying  non-spherical  clusters  in  the  high  dimensional  spectral  space.  Second,  the 
Cluster  and  Proportion  Constrained  Multi-Model  Unmixing  (CC-MMU  and  PC-MMU)  algorithms 
use  partial  supervision  information,  in  the  form  of  cluster  or  proportion  constraints,  to  guide  the 
search  process  and  narrow  the  space  of  possible  solutions.  The  supervision  information  could  be 
provided  by  an  expert,  generated  by  analyzing  the  consensus  of  multiple  unmixing  algorithms,  or 
extracted  from  co-located  data  from  a  different  sensor.  Third,  the  Robust  Context  Dependent 
Spectral  Unmixing  (RCDSU)  introduces  possibilistic  memberships  into  the  objective  function  to 
reduce  the  effect  of  noise  and  outliers  in  the  data.  Finally,  the  Unsupervised  Robust  Context 
Dependent  Spectral  Unmixing  (U-RCDSU)  algorithm  learns  the  optimal  number  of  contexts  in  an 
unsupervised  way.  The  performance  of  each  algorithm  is  evaluated  using  synthetic  and  real  data. 
We  show  that  the  proposed  methods  can  identify  meaningful  and  coherent  contexts,  and  appropriate 
endmembers  within  each  context. 

The  second  main  contribution  of  this  thesis  is  consensus  unmixing.  This  approach  exploits 
the  diversity  and  similarity  of  the  large  number  of  existing  unmixing  algorithms  to  identify  an 


v 


accurate  and  consistent  set  of  endmembers  in  the  data.  We  run  multiple  unmixing  algorithms  using 
different  parameters,  and  combine  the  resulting  unmixing  ensemble  using  consensus  analysis.  The 
extracted  endmembers  will  be  the  ones  that  have  a  consensus  among  the  multiple  runs. 

The  third  main  contribution  consists  of  developing  subpixel  target  detectors  that  rely  on 
the  proposed  CDSU  algorithms  to  adapt  target  detection  algorithms  to  different  contexts.  A  local 
detection  statistic  is  computed  for  each  context  and  then  all  scores  are  combined  to  yield  a  final 
detection  score.  The  context  dependent  unmixing  provides  a  better  background  description  and 
limits  target  leakage,  which  are  two  essential  properties  for  target  detection  algorithms. 
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CHAPTER  1 


INTRODUCTION 

1.1  Hyperspectral  image  data  and  analysis 

Hyperspectral  imaging  contributes  significantly  to  earth  observation  and  remote  sensing 
[1-13].  It  is  also  used  in  food  safety  [14-17],  pharmaceutical  process  monitoring  and  quality  control 
[18-22],  as  well  as  in  biomedical  [23,24],  industrial  [25],  biometric  [26]  and  forensic  applications  [27]. 

Hyperspectral  sensors  capture  both  the  spatial  and  spectral  information  of  a  scene.  They 
collect  radiance  data  in  hundreds  of  contiguous  wavelengths.  The  focus  is  mainly  on  the  visible, 
near- infrared  and  shortwave  infrared  spectral  bands  (in  the  range  between  0.4/wi  and  2.5 /im  [7]). 
As  the  sensor  collects  data  over  a  region,  a  data  cube  is  generated.  Figure  1.1  illustrates  this  concept. 
The  data  cube  can  be  interpreted  as  a  stack  of  two-dimensional  images  captured  over  a  range  of 
wavelengths.  Each  element  of  the  data  cube  corresponds  to  the  radiance  measured  at  a  particular 
wavelength  at  one  ground  location  [1,28,29]. 

Spectral  and  spatial  resolutions  are  two  important  characteristics  of  a  hyperspectral  sensor. 
The  spectral  resolution  of  a  sensor  corresponds  to  the  range  of  wavelengths  over  which  radiance  values 
are  measured  and  combined  to  become  a  single  band  in  a  hyperspectral  image.  The  spatial  resolution 
corresponds  to  the  size  of  the  physical  area  on  the  ground  from  which  radiance  measurements  are 
taken  for  a  single  image  pixel.  As  the  area  corresponding  to  one  pixel  increases,  the  spatial  resolution 
of  the  image  decreases  [1,28]. 

The  main  appeal  for  hyperspectral  imaging  is  that  different  materials  reflect  and  emit  vary¬ 
ing  amounts  of  radiance  across  the  electromagnetic  spectrum.  In  other  words,  different  materials 
generally  have  unique  spectral  signatures.  Thus,  hyperspectral  sensors  can  be  used  to  identify  and 
distinguish  between  different  materials  in  a  scene  [28]. 

In  a  hyperspectral  image,  a  pixel  can  be  pure  or  mixed.  A  pure  pixel  corresponds  to  a 
single  material’s  radiance  values.  On  the  other  hand,  a  mixed  pixel  is  a  pixel  which  combines 
the  radiance  values  of  multiple  materials.  Typically,  most  pixels  are  mixed  as  a  result  of  the  low 
spatial  resolution  [1,28].  This  is  because  as  the  pixel’s  corresponding  area  on  the  ground  increases, 
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Figure  1.1:  Illustration  of  the  hyperspectral  imaging  concept.1 


neighboring  materials  are  likely  to  be  combined  into  the  image  pixel.  Mixed  pixels  also  occur  when 
the  different  materials  are  mixed  on  the  ground.  Beach  sand  is  a  common  example  for  this  type  of 
mixed  pixel  since  grains  of  different  materials  are  intermingled  [1], 

Pure  spectral  signatures,  or  the  constituent  spectra ,  in  an  imaged  scene  are  referred  to  as 
encLmembers  [1].  In  the  purest  sense,  endmembers  can  represent  unique  elements,  e.g.,  calcium,  iron, 
and  copper.  However,  in  the  practical  sense  of  hyperspectral  imaging,  the  endmembers  more  likely 
represent  disparate  macroscopic  entities,  e.g.,  desert,  forest,  metal,  and  salt  water  [30].  Due  to  the 
presence  of  mixed  pixels  in  a  hyperspectral  image,  spectral  unmixing  is  often  performed  to  decompose 
mixed  pixels  into  their  respective  endmembers  and  abundances.  Abundances  are  the  proportions 
of  the  endmembers  in  each  pixel.  In  estimating  the  endmembers  and  abundances  in  pixel  spectra, 
unmixing  algorithms  incorporate  philosophical  assumptions  regarding  the  physical  mechanisms  and 
mathematical  structure  by  which  the  reflectance  properties  from  disparate  substances  combine  to 
yield  the  mixed  pixel  spectra.  In  other  words,  spectral  unmixing  relies  on  the  definition  of  a  mixing 
model. 

Mixing  models  can  be  characterized  as  either  linear  or  nonlinear  [1,31].  The  linear  mixing 
1This  image  was  taken  from  [29]. 
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model  (also  known  as  the  convex  geometry  model)  holds  when  the  mixing  scale  is  macroscopic  [32]  and 
the  incident  light  interacts  with  just  one  material,  as  is  the  case  in  checkerboard  type  scenes  [33,34], 
In  this  case,  the  mixing  occurs  within  the  sensor  itself.  It  is  due  to  the  fact  that  the  spatial  resolution 
of  the  sensor  is  not  fine  enough.  The  light  from  the  materials,  although  almost  completely  separated, 
is  mixed  within  the  measuring  sensor.  Figure  1.2  depicts  linear  mixing  where  the  reflecting  surface  is 
portrayed  as  a  checkerboard  mixture,  and  the  incident  radiation  bounces  only  once  on  its  surface.  If 
the  total  surface  area  is  conceived  to  be  divided  proportionally  according  to  the  fractional  abundances 
of  the  constituent  substances,  then  the  reflected  radiation  will  convey,  with  the  same  proportions, 
the  characteristics  of  the  associated  media. 


Figure  1.2:  Linear  mixing  from  a  checkerboard  mixture  of  materials  with  a  single  reflection.1 


The  nonlinear  mixing  is  usually  due  to  the  light  interaction  with  multiple  materials  in  the 
scene.  These  interactions  can  be  at  a  classical  or  multilayered  level,  or  at  a  microscopic  or  intimate 
level.  Mixing  at  the  classical  level  occurs  when  light  scattered  from  one  or  more  objects,  is  reflected 
off  additional  objects,  and  eventually  measured  by  the  hyperspectral  sensor.  Microscopic  mixing 
occurs  when  two  materials  are  homogeneously  mixed  [35].  In  this  case,  the  interactions  consist 
of  photons  emitted  from  the  molecules  of  one  material  that  are  absorbed  by  molecules  of  another 
material,  which  may  in  turn  emit  more  photons.  Figure  1.3  illustrates  nonlinear  mixing  from  an 
intimate  mixture. 

Despite  its  simplicity,  the  linear  mixing  model  has  proved  to  be  an  acceptable  approximation 
of  the  light  scattering  mechanisms  in  many  real  scenarios.  Furthermore,  in  contrast  to  nonlinear 
mixing,  the  linear  mixing  model  is  the  basis  of  a  plethora  of  unmixing  algorithms  [1,30,36-48]. 

1This  image  was  taken  from  [1]. 
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Figure  1.3:  Nonlinear  mixing  from  an  intimate  mixture  of  materials.1 


1.2  Linear  Mixture  Model 

The  standard  model  used  to  perform  spectral  unmixing  is  the  convex  geometry  model  (also 
known  as  the  linear  mixing  model).  It  states  that  every  pixel’s  spectral  signature  is  a  convex  com¬ 
bination  of  endmembers  in  the  scene.  This  has  been  shown  to  hold  in  cases  where  the  endmembers 
are  mixed  by  the  spatial  resolution  of  the  imaging  sensor  [1,36].  If  the  convex  geometry  holds,  the 
endmembers  are  the  spectra  found  at  the  corners  of  a  convex  region  enclosing  all  the  spectra  in  the 
hyperspectral  scene.  This  model  is  defined  as  [1]: 

M 

Xj  ='^2pjkek  +  e:j,  j  =  l,...,N  (1.1) 

k— 1 

where  Xj  (lx  d)  is  the  spectral  signature  of  pixel  j,  d  is  the  number  of  spectral  bands,  N  is  the 
number  of  pixels  in  the  image,  M  is  the  number  of  endmembers,  €j  (1  x  d)  is  an  error  term,  pjk  is 
the  proportion  of  endmember  k  in  pixel  j,  and  (1  x  d)  is  the  kth  endmember.  The  proportions 
satisfy  the  following  constraints: 

M 

Pjk  >  0 ,  Vfc  =  1, M;  and  y 'pjk  =  1,  Wj  =  1  (1.2) 

k= 1 

In  the  hyperspectral  unmixing  literature,  the  constraints  in  (1.2)  are  referred  to  as  abundance  non¬ 
negativity  constraint  (ANC)  and  abundance  sum  constraint  (ASC),  respectively.  Equation  (1.1)  can 
be  rewritten  in  a  matrix  format  as: 

xj  =  PjE  +  Cj,  j  =  (1.3) 

where  E  ( M  x  d)  is  the  mixing  matrix  containing  the  endmembers,  p j  (1  x  M)  are  the  proportions 
of  pixel  j  in  the  M  endmembers,  and  Cj  (lx  d)  is  an  error  term.  The  constraints  in  (1.2)  can  also 

1This  image  was  taken  from  [1], 
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be  rewritten  in  a  matrix  format  as: 

P j  >  0,  and  IixmpJ  =  1,  Vj  =  1, N.  (1.4) 

where  1ixm  is  a  1  x  M  vector  of  ones. 

Assuming  that  the  rows  of  E  are  affinely  independent,  i.e. ,  e2  —  ei,  e3  —  el5  ...,  ejf  —  e!  are 
linearly  independent,  then  the  set 

f  M  1 

<  x  =  pE,  such  that  ^  pk  =  1  ,Pk  >  0,  k  =  1, ...,  M  >  (1.5) 

i.e.,  the  convex  hull  of  the  rows  of  E  (conv{E}),  is  a  (M  —  l)-simplex  in  Rd.  Figure  1.4  illustrates  a 
2-simplex  for  a  hypothetical  mixing  matrix  E  containing  three  endmembers.  In  this  figure,  the  green 
points  denote  spectral  vectors,  and  the  red  points  are  the  vertices  of  the  simplex  and  correspond  to 
the  endmembers.  Note  that  the  inference  of  the  mixing  matrix  E  is  equivalent  to  identifying  the 
vertices  of  the  simplex.  This  is  referred  to  as  geometrical-based  unmixing. 

ei 


Figure  1.4:  Illustration  of  the  2-simplex.  Green  points  represent  spectral  vectors.  Red  points 
represent  vertices  of  the  simplex  and  correspond  to  the  endmembers. 

Given  a  data  set  X  (N  x  d)  containing  N  d-dimensional  spectral  vectors,  the  linear  hy- 
perspectral  unmixing  problem,  with  reference  to  the  linear  model  (1-3),  consists  of  estimating  the 
mixing  matrix  E  and  the  fractional  abundance  vectors  p^  for  each  pixel  j  =  1, 

1.3  Motivations  and  overview  of  the  proposed  research 
1.3.1  Motivations 

Most  of  the  existing  linear  spectral  unmixing  algorithms  assume  that  the  hyperspectral  data 
points  lie  in  a  single  convex  region  with  one  set  of  endmembers.  However,  it  may  be  the  case  that 
multiple  sets  of  endmembers,  defining  several  overlapping  convex  regions,  can  better  describe  the 
hyperspectral  image.  This  issue  has  been  addressed  in  [49-53],  where  the  linear  mixing  model  has 
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been  extended  to  multiple  sets  of  endmembers.  Each  endmember  set  is  found  using  the  convex 
geometry  model  resulting  in  a  piece-wise  convex  representation  of  the  hyperspectral  data.  Another 
limitation  of  existing  spectral  unmixing  algorithms  is  that  they  do  not  take  into  account  the  distri¬ 
bution  of  the  data  in  the  spectral  space  while  unmixing.  This  is  the  case  even  for  the  piece-wise 
convex  representation  in  [49]. 

To  address  the  above  limitations  of  linear  spectral  unmixing,  we  propose  a  local  hyperspectral 
unmixing  algorithm,  called  Context  Dependent  Spectral  Unmixing  (CDSU).  CDSU  takes  into  account 
the  distribution  of  the  data  in  the  spectral  space  while  identifying  multiple  sets  of  endmembers.  In 
other  words,  the  unmixing  process  is  adapted  to  different  regions  of  the  spectral  space. 

Another  challenge  with  most  unmixing  algorithms  is  that  they  require  the  knowledge  of  the 
number  of  endmembers  to  be  extracted  before  hand.  Moreover,  different  algorithms  have  different 
assumptions  and  modes  of  operation,  usually  yielding  different  results.  Even  the  same  algorithm 
may  not  result  in  the  same  endmembers  when  run  multiple  times.  This  is  mainly  due  to  the  non- 
deterministic  behavior  of  the  algorithm.  To  address  this  limitation,  we  investigate  using  multiple 
algorithms  with  different  parameters  to  identify  an  accurate  and  consistent  set  of  endmembers  using 
consensus  analysis. 

Spectral  unmixing  is  a  goal  in  itself,  where  one  is  interested  in  identifying  the  materials 
present  in  the  scene.  Unmixing  is  also  an  initial  step  to  other  hyperspectral  imaging  applications, 
such  as  target  detection.  In  fact,  spectral  unmixing  is  used  to  describe  the  background  with  a 
set  of  endmembers,  based  on  which  a  detection  statistic  is  computed  for  every  pixel.  Background 
description  is  of  paramount  importance  in  target  detection.  A  better  description  allows  for  a  better 
detection.  Hence,  we  propose  using  the  context  dependent  unmixing  framework  to  design  a  new 
class  of  detectors  called  Context  Dependent  Target  Detectors. 

1.3.2  Contributions 

Our  main  contributions  can  be  summarized  as  follows: 

•  We  propose  a  Context  Dependent  Spectral  Unmixing  (CDSU)  [54]  algorithm.  CDSU  is  based 
on  optimizing  an  objective  function  that  combines  context  identification  and  spectral  unmixing. 
The  context  or  region  identification  component  thrives  to  partition  the  input  spectral  space 
into  different  clusters  (called  contexts).  The  spectral  unmixing  component  thrives  to  learn 
optimal  endmembers  and  abundances  within  each  cluster. 
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•  We  propose  an  extension  to  CDSU  using  the  Mahalanobis  distance  (CDSUm)  [55].  CDSUm 
supports  non-spherical  cluster  shapes. 

•  We  propose  two  semi-supervised  versions  of  CDSU  that  use  partial  supervision  information  to 
constrain  the  problem,  guide  the  optimization  and  narrow  the  space  of  possible  solutions.  The 
Cluster  Constrained  Multi-Model  Unmixing  (CC-MMU)  [55]  algorithm  uses  cluster  assignment 
constraints  on  the  pixels,  while  the  Proportion  Constrained  Multi-Model  Unmixing  (PC-MMU) 
algorithm  uses  constraints  on  the  proportions  of  the  pixels. 

•  We  propose  a  Robust  Context  Dependent  Spectral  Unmixing  (RCDSU)  [56]  algorithm.  RCDSU 
handles  noise  and  outliers  in  the  data,  and  finds  the  optimal  number  of  contexts  in  an  unsu¬ 
pervised  way  (U-RCDSU)  [56]. 

•  We  propose  a  robust  unmixing  approach  based  on  consensus  analysis  [57].  We  run  multiple 
unmixing  algorithms  using  different  parameters,  and  the  goal  is  to  find  a  consensus  unmixing 
by  combining  the  unmixing  ensemble  resulting  from  those  algorithms. 

•  We  propose  a  new  class  of  target  detection  algorithms,  called  Context  Dependent  Target 
Detectors  [58],  that  takes  advantage  of  the  context  dependent  unmixing  framework.  The 
detection  is  performed  locally  within  the  extracted  contexts,  and  a  global  detection  statistic 
is  computed  as  a  weighted  sum  of  the  local  scores. 

The  remainder  of  this  dissertation  is  organized  as  follows.  Chapter  2  provides  a  review 
of  some  linear  spectral  unmixing  and  target  detection  algorithms.  Chapter  3  introduces  the  pro¬ 
posed  context  dependent  spectral  unmixing  algorithm  and  its  variations.  Chapter  4  introduces  the 
proposed  robust  unmixing  using  consensus  analysis.  Chapter  5  introduces  the  proposed  context 
dependent  target  detection  algorithms.  Chapter  6  provides  experimental  results  and  analyses  of  the 
proposed  methods.  Finally,  chapter  7  provides  conclusions  and  potential  future  work. 
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CHAPTER  2 


LITERATURE  REVIEW 

This  chapter  provides  a  review  of  some  existing  linear  hyperspectral  unmixing  and  target 
detection  algorithms.  Methods  that  are  relevant  to  our  work  are  described  in  details. 

2.1  Linear  spectral  unmixing 

Spectral  unmixing  algorithms  can  be  categorized  into  two  main  categories:  pure  pixel  based 
and  minimum  volume  based  approaches. 

2.1.1  Pure  pixel  based  unmixing  algorithms 

The  pure  pixel  based  algorithms  belong  to  the  minimum  volume  based  approaches.  They 
have  the  additional  assumption  that  the  data  have  at  least  one  pure  pixel  per  endmember.  In  other 
words,  they  assume  that  there  is  at  least  one  spectral  vector  on  each  vertex  of  the  data  simplex.  This 
assumption  is  a  strong  requisite  that  may  not  hold  in  many  datasets.  These  algorithms  find  the  set  of 
purest  pixels  in  the  data.  They  have  been  often  used  in  linear  hyperspectral  unmixing  applications, 
mainly  because  of  their  light  computational  burden  and  clear  conceptual  meaning  [29].  Algorithms 
relying  on  the  pixel  purity  assumption  include  the  Pixel  Purity  Index  (PPI)  algorithm  [59]  and  the 
N-FINDR  algorithm  [60],  both  of  which  are  described  in  the  next  subsections. 

2. 1.1.1  The  Pixel  Purity  Index 

The  Pixel  Purity  Index  (PPI)  [59]  is  a  commonly  used  algorithm  for  determining  the  purest 
pixels  in  a  given  hyperspectral  image.  It  ranks  image  pixels  based  on  their  purity  indices.  Then, 
the  M  pixels  with  the  highest  purity  values  are  returned  as  potential  endmembers.  The  number  of 
endmembers,  M .  is  assumed  to  be  known.  PPI  is  often  used  for  generating  candidate  endmembers 
which  are  then  used  as  inputs  to  other  endmember  extraction  algorithms  [61]  or  loaded  into  a 
visualization  tool  for  users  to  manually  select  endmembers  from  the  list  of  potential  candidates  [62] . 
The  PPI  algorithm  assigns  each  pixel  a  purity  value  by  repeatedly  projecting  all  of  the  pixels  onto 


skewers,  defined  as  a  large  set  of  random  vectors.  The  algorithm  is  initialized  by  setting  the  purity  of 
each  pixel  to  zero.  The  pixel  purity  values  are  updated  following  each  random  projection  by  adding 
one  to  the  values  of  the  pixels  that  fall  near  either  end  of  every  projection.  Since  PPI  values  are 
generated  using  random  vectors,  the  results  are  dependent  on  the  number  of  random  projections 
and  the  threshold  for  determining  if  a  pixel’s  projection  is  considered  near  an  end-point  [59]. 

2. 1.1.2  N-FINDR 

N-FINDR  [60]  is  based  on  the  fact  that  the  volume  defined  by  a  simplex  formed  by  the 
purest  pixels  is  larger  than  any  other  volume  defined  by  any  other  combination  of  pixels.  This 
algorithm  finds  the  set  of  pixels  defining  the  largest  volume  by  inflating  a  simplex  inside  the  data. 
The  algorithm  begins  by  randomly  selecting  a  set  of  M  pixels  from  the  image  to  be  the  initial 
endmember  set  E.  Then,  each  endmember  is  replaced,  in  succession,  by  all  other  pixels  in  the 
image.  After  each  replacement,  the  volume  of  the  space  defined  by  the  current  set  of  potential 
endmembers  is  computed.  When  a  replacement  increases  the  volume,  the  replacement  is  maintained. 
The  algorithm  cycles  through  all  image  pixels  and  endmembers  until  no  further  replacements  are 
made. 

The  volume  enclosed  by  each  set  of  potential  endmembers  is  computed  using: 

F(E*)=  (M-l)!aM|El)’  (2'1} 

where  abs(.)  refers  to  the  absolute  value,  |.|  refers  to  the  determinant,  and 

E*  =  [ImxIi  E].  (2.2) 

If  the  dimensionality  of  the  data  is  larger  than  (M  —  1),  then  a  dimensionality  reduction  method, 
such  as  Principal  Components  Analysis  or  Maximum  Noise  Fraction,  must  be  employed  [63,64].  The 
data  dimensionality  must  be  one  less  than  the  desired  number  of  endmembers  since  the  determinant 
of  a  non-square  matrix  is  not  defined  [60] .  In  addition  to  assuming  that  pure  pixels  can  be  found  in 
the  image,  N-FINDR  requires  the  knowledge  of  the  number  of  endmembers  in  advance. 

Other  pure  pixel  based  algorithms  include  the  Iterative  Error  Analysis  (IEA)  algorithm  [65] , 
the  Vertex  Component  Analysis  (VC A)  algorithm  [66],  the  Simplex  Growing  Algorithm  (SGA)  [67], 
and  the  Sequential  Maximum  Angle  Convex  Cone  (SMACC)  algorithm  [68].  The  IEA  implements 
a  series  of  linear  constrained  unmixings,  each  time  choosing  as  endmembers  those  pixels  which  min¬ 
imize  the  remaining  error  in  the  unmixed  image.  The  VCA  algorithm  iteratively  projects  data  onto 
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a  direction  orthogonal  to  the  subspace  spanned  by  the  detected  endmembers.  The  new  endmember 
signature  corresponds  to  the  maximum  of  the  projection.  The  algorithm  iterates  until  all  endmem¬ 
bers  are  exhausted.  The  SGA  iteratively  grows  a  simplex  by  finding  the  vertices,  one  at  a  time, 
corresponding  to  the  maximum  volume.  The  SMACC  algorithm  represents  the  endmembers  using 
a  convex  cone.  It  starts  with  a  single  endmember  and  every  iteration  the  data  vector,  making  the 
maximum  angle  with  the  existing  cone,  is  chosen  as  the  next  endmember.  The  algorithm  terminates 
when  all  of  the  data  vectors  are  within  the  convex  cone,  to  some  tolerance.  A  quantitative  and 
comparative  analysis  of  these  methods  is  given  in  [39]. 

2.1.2  Minimum  volume  based  unmixing  algorithms 

The  minimum  volume  approaches  seek  a  mixing  matrix  E  that  minimizes  the  volume  of  the 
simplex  defined  by  its  rows  (endmembers),  subject  to  the  constraint  that  it  contains  the  observed 
spectral  vectors.  The  pure  pixel  constraint  is  no  longer  enforced. 

Examples  of  minimum  volume  based  algorithms  include  the  Nonnegative  Matrix  Factor¬ 
ization  Minimum  Volume  Transform  (NMF-MVT)  algorithm  [69],  the  Minimum  Volume  Simplex 
Analysis  (MVSA)  algorithm  [70],  and  the  Simplex  Identification  via  Split  Augmented  Lagrangian 
(SISAL)  algorithm  [71]. 

In  the  following,  we  focus  on  two  minimum  volume  based  algorithms  that  are  closely  related 
to  our  proposed  approach.  The  first  one  is  the  Iterated  Constrained  Endmembers  (ICE)  algorithm 
[61]  which  fits  a  simplex  to  the  data  while  penalizing  its  volume.  The  second  one  is  the  Piece- 
wise  Convex  Multiple  Model  Endmember  Detection  (P-COMMEND)  algorithm  [49]  which  models 
a  hyperspectral  image  using  a  piece-wise  convex  representation. 

2. 1.2.1  ICE:  Iterated  Constrained  Endmembers 

The  ICE  algorithm  [61]  is  based  on  the  joint  optimization  of  two  terms.  The  first  term  is  the 
residual  sum  of  squares  ( RSS )  based  on  the  convex  geometry  model  in  equation  (1.1).  This  term  is 
defined  as 

n  /  m  \  /  m  \T 

RSS  =  'y  ]  I  x.j  —  y  yPjkek  J  I  xj  —  y  ]pjkek  I  .)  (2-3) 

i=i  V  fc= i  /  V  *= i  / 

where  Xj  (lx  d)  is  the  spectral  signature  of  pixel  j,  d  being  the  number  of  spectral  bands,  N  is  the 
number  of  pixels  in  the  image,  M  is  the  number  of  endmembers,  pjk  is  the  proportion  of  endmember 
k  for  pixel  j,  and  (1  x  d)  is  the  fcth  endmember.  The  proportions  satisfy  the  constraints  in 
equation  (1.2). 
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The  second  term  of  the  ICE  algorithm  is  the  sum  of  squared  distances  ( SSD )  between  all  the 
simplex  vertices.  This  term  represents  an  approximation  of  the  volume  of  the  simplex.  Therefore, 
by  adding  this  term,  ICE  finds  endmembers  that  provide  a  tight  fit  around  the  data.  The  SSD  term 
is  defined  as 

M— 1  M 

SSD  =  ^2  X!  (efc  -  e;)(efc  -  e;)T.  (2.4) 

k—1  l=k-\- 1 

The  objective  function  of  the  ICE  algorithm  is  a  weighted  sum  of  both  RSS  and  SSD  terms  [62]. 


It  is  defined  as 


,  .  RSS  SSD 

J=(l-/z)-—  +  »-. 


N  '  ^  M(M  —  1)  ’  ^2'5') 

where  /i  is  a  tradeoff  or  a  regularization  parameter  in  (0,1)  used  to  balance  the  RSS  and  SSD 

terms.  In  [61],  the  authors  recommend  using  one  value  of  p  for  all  datasets.  For  this  to  be  possible, 
the  objective  function  should  be  approximately  independent  of  the  sample  size  N  and  the  number 
of  endmembers  M.  This  is  the  reason  for  normalizing  RSS  by  N  and  SSD  by  M(M  —  1)  in  (2.5). 

It  is  instructive  to  consider  the  limiting  behavior  of  /i  near  its  extreme  values,  0  and  1.  As  /.t 
tends  to  0,  the  limiting  solution  is  an  (M  —  l)-simplex  which  totally  encloses  the  data  points  while 
ignoring  the  tightness  of  fit.  On  the  other  hand,  as  /i  tends  to  1,  the  algorithm  results  in  a  trivial 
solution  where  all  the  endmembers  converge  to  one  point,  the  mean  of  the  data. 

The  ICE  algorithm  minimizes  the  objective  function  in  (2.5)  iteratively.  First,  given  end- 
member  estimates,  the  proportions  for  each  pixel  are  estimated.  Initially,  endmembers  may  be  set 
to  randomly  chosen  pixels  from  the  image.  Estimating  the  proportions  requires  a  least  squares  min¬ 
imization  of  each  term  in  equation  (2.3).  Since  each  of  these  terms  is  quadratic  and  subjected  to 
the  linear  constraints  in  equation  (1.2),  the  minimization  can  be  achieved  using  quadratic  program¬ 
ming.  After  solving  for  the  proportions,  the  endmembers  are  updated  using  the  current  proportion 
estimates  [61]. 

Like  most  iterative  solutions  to  nonlinear  continuous  parameter  optimization  problems,  the 
ICE  algorithm  will  asymptotically  approach  a  local  minimum  of  the  objective  function  [61].  There¬ 
fore,  the  iterative  procedure  is  stopped  when  the  estimated  parameters  do  not  change  significatively 
between  successive  iterations.  Algorithm  2.1  illustrates  the  steps  of  the  ICE  algorithm. 

Although  ICE  is  an  effective  algorithm  for  finding  endmembers,  it  can  provide  only  a  single 
set  of  endmembers  for  the  entire  input  data  set.  This  may  not  provide  a  compact  description  of 
the  hyperspectral  scene.  An  attempt  to  alleviate  this  shortcoming  was  proposed  in  [49],  where  the 
linear  mixture  model  has  been  extended  to  multiple  sets  of  endmembers.  It  is  called  the  Piece-wise 
Convex  Multiple  Model  Endmember  Detection  (P-COMMEND)  algorithm. 


11 


Algorithm  2.1  Iterated  Constrained  Endmembers 

Inputs:  X:  the  data  points  (N  x  d). 

M:  the  number  of  endmembers. 

/z:  the  regularization  parameter  /z  G  (0, 1). 

Outputs:  E:  the  estimated  endmembers. 

P:  the  estimated  proportions. 

Initialize  E 
repeat 
Update  P. 

Update  E. 

until  parameters  do  not  change  significatively 

return  E,  P 

2. 1.2.2  P-COMMEND:  Piece-wise  Convex  Multiple  Model  Endmember  Detection 

P-COMMEND  [49]  is  a  hyperspectral  unmixing  algorithm  that  finds  multiple  sets  of  end- 
members.  It  models  a  hyperspectral  image  using  a  piece-wise  convex  representation  to  characterize 
non-convex  hyperspectral  data.  It  assumes  that  a  hyperspectral  scene  contains  multiple  distinct 
regions  that  do  not  share  common  materials.  Each  region  is  defined  by  a  simplex  with  a  set  of 
endmembers. 

P-COMMEND  estimates  endmember  sets  and  proportion  values  by  minimizing 

G  (  N  M—l  M  \ 

J  =  ^2  uij  “  p bEi)  (XJ  -  Piz'Ezf  +  a  ^2  ^2  (eik  -  e,;;)(e4fc  -  e.a)T  ,  (2.6) 

1—1  \j—1  k- 1  i=fe+ 1  j 

subject  to 

c 

Uij  G  [0,l],Vi,j,  =  1  ,Wj  (2.7) 

i= 1 

and 

Pi j  >  0,  and  lixMP fj  =  1  ,Vz,j.  (2.8) 

where  Xj  (1  x  d)  is  the  spectral  signature  of  pixel  j ,  d  is  the  number  of  spectral  bands,  N  is  the 
number  of  pixels  in  the  image,  M  is  the  number  of  endmembers,  and  C  is  the  number  of  models, 
convex  regions,  or  sets  of  endmembers.  In  (2.6),  p,j  (1  x  M)  is  the  vector  of  proportions  associated 
with  pixel  j  with  respect  to  model  i,  E,  (M  x  d)  is  the  mixing  matrix  corresponding  to  model  i, 
and  (1  x  d)  is  the  fcth  row  of  E$  representing  the  fcth  endmember  in  the  ith  endmember  set.  In 
(2.6),  represents  the  membership  of  pixel  j  in  model  i,  indicating  the  degree  to  which  pixel  j 
contributes  to  the  endmembers  of  convex  set  i.  Finally,  m  G  (1,  +oo)  is  a  fuzzifier  controlling  the 
degree  of  sharing  between  the  models,  and  a  is  a  fixed  parameter  used  to  balance  the  two  terms  of 
the  objective  function. 
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The  first  term  in  (2.6)  computes  the  residual  sum  of  squares  (RSS)  between  each  input 
hyperspectral  data  point  and  its  estimate  using  the  estimated  endmembers  and  proportion  values. 
The  second  term  is  used  to  constrain  the  size  of  each  simplex  by  minimizing  the  sum  of  squared 
distances  (SSD)  between  each  pair  of  endmembers  within  each  set.  Note  that,  when  C  =  1,  the 
P-COMMEND  objective  function  reduces  to  the  ICE  objective  function  in  equation  (2.5)  (to  a 
normalization  factor). 

In  [49],  the  authors  showed  that,  using  Lagrange  multipliers  optimization  along  with  the 
Karush-Kuhn-Tucker  (KKT)  conditions,  the  objective  function  in  (2.6)  can  be  minimized  by  updat¬ 
ing  the  endmembers,  the  proportions  and  the  fuzzy  memberships  using 

Ej  =  q(MImxm  —  3-mxm)  + 


<PijPu  Y  'O’-A. 

3= 1  3= 1 


p  fj  =  max 


p1?t]  1  bp  t  ,  1  1ixA/(EjEi  )  EjXj-  -I  \ 

+  — j - pTi-ii - 1mx1  ’U  ’ 

L  J  L  J-l xM^i&i  j  J-Mxl  J  ' 


1 

(xj  -  P ij E * )  (xj  -  Pij E?;  )T]  1“m 

uij  ~  c  i  • 

E  [(xj  —  P —  PgjE9)T]  1~m 
9=1 

In  (2.9)  and  (2.10),  the  notation  [A]-1  refers  to  the  inverse  of  matrix  A. 
The  P-COMMEND  algorithm  is  outlined  in  Algorithm  2.2  [49]. 


Algorithm  2.2  Piece-wise  Convex  Multiple  Model  Endmember  Detection 

Inputs:  X:  the  data  points  ( N  x  d). 

C :  the  number  of  models. 

M :  the  number  of  endmembers  for  each  model, 
m:  the  fuzzifier,  m  £  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 

Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples. 

Ex  the  sets  of  endmembers  in  all  models. 

P.p  the  sets  of  proportions  in  all  models. 

Initialize  U  and  E*. 

repeat 

Update  P,  using  (2.10). 

Update  Ej  using  (2.9). 

Update  U  using  (2.11). 
until  parameters  do  not  change  significatively 

return  U,  E,,  P, 

In  the  implementation  of  the  P-COMMEND  algorithm  [49],  the  membership  values  are 
initialized  using  the  Fuzzy  C-Means  algorithm  [72]  (which  is,  in  turn,  randomly  initialized),  and  the 
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endmember  sets  E.(  are  initialized  using  the  Minimum  Volume  Simplex  Analysis  (MVSA)  algorithm 
[70].  The  algorithm  is  stopped  whenever  the  estimated  parameters  do  not  change  significatively 
between  successive  iterations. 


Other  variations  of  the  P-COMMEND  algorithm  have  been  proposed.  These  include  a 
multiple  model  endmember  detection  algorithm  based  on  spectral  and  spatial  information  [50],  a 
spatially-smooth  piece- wise  convex  endmember  detection  algorithm  [51],  a  competitive  agglomera¬ 
tion  piece- wise  convex  multiple  model  endmember  detection  algorithm  [52] ,  and  a  piece- wise  convex 
spatial-spectral  unmixing  algorithm  using  possibilistic  and  fuzzy  clustering  [53]. 

The  multiple  model  endmember  detection  algorithm  based  on  spectral  and  spatial  informa¬ 
tion  [50]  adds  a  spatial  information  term  to  the  P-COMMEND  objective  function.  The  idea  is  to 
fit  the  hyperspectral  data  using  a  convex  geometry  model  locally.  The  assignment  of  a  point  to  a 
model  is  done  according  to  its  spectral  and  spatial  information.  Using  the  same  notation  as  for  the 
P-COMMEND  objective  function  above,  the  objective  function  is  defined  as: 


c  r  n  n 

J=Z  E  <  (xi  -  PijE*)  (xj  -  PijEif  +  /)  V  u%{yj  -  c,){y,  -  <•,)' 

i=\  L  j—1  j= 1 

M—l  M 

TO:  E  E  {&ik  ) 

k—1  l—k+ 1 


(2.12) 


subject  to  the  constraints  in  (2.7)  and  (2.8). 

In  (2.12),  y:j  is  the  2-dimensional  spatial  coordinate  vector  of  pixel  j.  c,  is  the  spatial  center  of 
the  points  assigned  to  the  *th  model,  and  p  is  a  scaling  parameter.  Using  Lagrange  multipliers 
optimization  along  with  the  Karush-Kuhn- Tucker  (KKT)  conditions,  it  was  shown  [50]  that  the 
objective  function  in  (2.12)  can  be  minimized  by  updating  the  endmembers  and  the  proportions 
using  (2.9),  (2.10),  and  the  fuzzy  memberships  and  centers  using 

[Ob  ^  P,jEi)(xi  -  PijEj)T  +  p{ yj  -  Ci)(yj  -  c,;)'i]  ^ 

? 

\T  J_  n(„.  _  Uv.  _  r-  '|Tlr=^ 


c 

E 

9=1 


(2.13) 


and 


E  [(XJ  -  P«E9)(Xj  -  Pqj'EqV  +  P{y j  ~  Cq){y j  ~  Cq)T]' 


E 

j=  1 

N  * 
j= 1 


C  i  = 


(2.14) 


The  spatially-smooth  piece- wise  convex  endmember  detection  algorithm  (Spatial  P-COMMEND) 
[51]  extends  P-COMMEND  by  incorporating  spatial  information  to  aid  in  estimating  endmembers 
and  abundance  values.  Spatial  information  is  incorporated  by  encouraging  neighboring  pixels  in  the 
image  to  have  similar  membership  values  to  the  different  convex  regions.  This  is  accomplished  by 
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adopting  the  spatially-smooth  Fuzzy  Local  Information  C-Means  method,  FLICM,  developed  in  [73] . 
The  FLICM  algorithm  adds  a  fuzzy  factor  term,  G,  to  the  objective  function  that  influences  the 
updates  of  the  membership  values  by  incorporating  spatial  information.  The  G  term,  adapted  for 
use  in  the  Spatial  P-COMMEND  algorithm,  is  defined  as: 


Gij  =  ^2  d  y(l  -  uik)m\ |xfc  -  PifcE, |||, 


where  j  is  the  center  pixel  in  the  local  window  under  consideration,  Nj  is  the  neighborhood  around 
the  center  pixel  (such  as  a  3  x  3  window),  and  djk  is  the  Euclidean  distance  in  pixel  space  of  the 
image  indices  between  pixels  j  and  k.  Therefore,  the  G  term  scales  the  influence  of  neighboring 
pixels  based  on  their  distances  to  the  center  pixel  in  the  index  space.  When  a  neighboring  pixel  has 
high  membership  in  a  convex  region,  the  center  pixel  under  consideration  is  encouraged  to  also  have 
a  high  membership  in  that  region.  The  fuzzy  factor  term  is  updated  every  iteration  and  treated 
as  a  constant  during  the  updates  of  the  endmembers,  abundances  and  membership  values.  Adding 
the  term  in  (2.15)  and  using  the  same  notation  as  for  the  P-COMMEND  objective  function,  the 

objective  function  of  the  Spatial  P-COMMEND  is  defined  as: 

c  /  N  M—l  m  \ 

J  —  '22  f  ^2uij  [(xj  ”  P  lv  !  (xj  -  p.  K  ) 1  +  Gij  j  +  a  ^2  ^2  (eifc  “  e«)(e>fc  -  e«)T  )  ,  (2-16) 

subject  to  the  constraints  in  (2.7)  and  (2.8). 

In  Spatial  P-COMMEND,  the  update  equations  for  the  endmembers  and  abundances  remain  the 
same  as  in  equations  (2.9)  and  (2.10)  respectively.  The  membership  update  equation  becomes 

[(Xj  p, J  K; )( x(  •  G,,  ^ 

utj  —  c  .  G-1'! 

S  [(xi  —  Pgj®g)(xi  —  P qj^q)T  +  Gqj\ 1_m 

9=1 

The  Spatial  P-COMMEND  algorithm  performs  alternating  optimization  on  the  endmembers,  abun¬ 
dances  and  memberships  until  some  stopping  criterion  is  reached  such  as  convergence  or  a  maximum 
number  of  iterations. 


The  competitive  agglomeration  piece-wise  convex  multiple  model  endmember  detection  al¬ 
gorithm  (CAP)  [52]  integrates  the  competitive  agglomeration  algorithm  [74]  into  P-COMMEND  in 
order  to  estimate  the  number  of  convex  regions  needed  for  a  given  data  set.  The  competitive  ag¬ 
glomeration  algorithm  uses  a  regularization  term  with  the  sum  of  squares  of  the  cardinalities  (sum 
of  the  memberships)  of  the  clusters.  Using  the  same  notation  as  for  the  P-COMMEND  objective 
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function,  the  objective  function  of  CAP  is  defined  as: 


c  r  n 


■)  E  E  "m  (xi  -  P«E<)  (XJ  -  PijE*) 


*=i  Lj=i 
M— 1  M 


M—  1  M  /  N  \ 

+«  E  E  (eifc  -  eu){eik  -  eu)T  -  /?(  E 

fc=l  /— /c+1  ^  ,7=1  ' 


(2.18) 


subject  to  the  constraints  in  (2.7)  and  (2.8).  In  (2.18),  /3  is  a  scaling  parameter. 

The  update  equations  for  the  endmenrbers  and  proportion  values  remain  the  same  as  in  P-COMMEND 
(equations  (2.9)  and  (2.10)  respectively)  with  a  fuzzifier  value  of  2.  The  update  equation  for  the 
membership  values  becomes 

PNi  +  Xi  (2.19) 


iXj  p, ,  K,  )i  x,  p. ,  K, ) 7  ' 


where 

1  _  ^  E  ||Xj-^Efc||r' 

A,  = - - ,  (2.20) 

y  _ 1 _ 

fc=l  !lxJ~PfcJEfclli 

and 

N 

Nk  =  (2.21) 

3= 1 

Competitive  agglomeration  encourages  sparsity  in  the  membership  values.  When  the  membership 
values  associated  with  a  single  convex  region  drop  below  a  prescribed  threshold,  the  convex  region 
can  be  removed.  Following  the  discussion  in  [74],  the  parameter  for  the  competitive  agglomeration 
term  is  adjusted  every  iteration  using  the  following  annealing  schedule: 

C  N 

E  E  uij  llxi  -  Pq-Eilli 

m  =  - ,  (2.22) 

E  (  E  ua ) 

i= 1  v  j= 1  / 

where  t  is  the  iteration  number,  and  r  and  0o  are  constants.  This  schedule  for  the  /3  parameter  tries 
to  balance  the  residual  error  term  and  the  competitive  agglomeration  term  while  giving  a  larger 
weight  to  the  error  term  as  the  number  of  iterations  increases. 

The  piece-wise  convex  spatial-spectral  unmixing  algorithm  using  possibilistic  and  fuzzy  clus¬ 
tering  [53]  associates  both  fuzzy  and  typicality  membership  values  with  each  pixel  to  learn  sets  of 
endmenrbers  (i.e.  convex  regions).  This  algorithm  integrates  concepts  from  the  Fuzzy  Local  Infor¬ 
mation  C-Means  (FLICM)  method  [73],  and  the  Possibilistic  Fuzzy  C-Means  (PFCM)  method  [75] 
into  the  P-COMMEND  objective  function.  Using  the  same  notation  as  for  P-COMMEND,  the 
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objective  function  is  defined  as: 


c  r  n  ,  \ 

j  =  E  E  (aw™  [(Xj  -  P//E,* )(Xy  -  p,,K,  ir  +  Gij]  +  bl’^xj  -  p,.,E,)(xy  -  p,,E,!7  ) 

i=l  1  j= 1  v  7 

M— 1  M  N 

+a  E  E  (eife  -  eu)(eik  -  e.j/)T  +  7i  E  (!  -  UjY 

k— 1  /— /c+1  j— 1 

subject  to  the  constraints  in  (2.7),  (2.8),  and 

C 

Uj  G  [0, 1], Vi, j,  and  <  1  ,Vj. 

i=l 


(2.23) 


(2.24) 


In  (2.23),  a,  b,  n  >  1  and  7*  are  positive  constants. 

In  [53],  it  was  shown  that  the  optimization  of  (2.23)  yields  an  update  equation  for  the  proportions 
that  is  the  same  as  the  P-COMMEND  algorithm  (equation  (2.10)).  Similarly,  the  membership 
update  equation  is  as  in  (2.17).  The  endmembers  need  to  be  updated  using 


(2.25) 


N 

-1 

N 

E<  = 

cx(M1Mxm  -  1a/xm)  +  y~^(a uTi  +  bt^pJjPij 

j=i 

and  the  update  equation  for  the  typicality  is 


tij  — 


,  (Er) ' 


(2.26) 


1  +  (tt|Ixj  -PpEilll) 

In  the  above  equation,  the  7 ,  values  are  set  to  be  the  mean  of  the  ||xj  —  p.^EjUl  values  for  all  the 
pixels  in  the  associated  convex  region.  Therefore,  7 i  are  updated  in  every  iteration. 

Although  the  P-COMMEND  algorithm  and  its  variations  allow  the  detection  of  multiple 
sets  of  endmembers  for  a  hyperspectral  image,  they  do  not  take  into  account  the  distribution  of  the 
data  in  the  spectral  space.  In  fact,  the  estimated  endnrember  sets  are  the  result  of  the  piece-wise 
convex  and  spatial  representation  of  the  data.  Thus,  they  do  not  have  a  spectral  meaning. 

In  the  next  chapter,  we  introduce  our  approach  for  hyperspectral  unmixing  which  alleviates 
this  shortcoming  by  taking  into  account  the  distribution  of  the  data  in  the  spectral  space  while 
unmixing. 


2.2  Target  detection  using  hyperspectral  imaging 

Besides  material  identification  through  spectral  unmixing,  remote  sensing  using  hyperspec¬ 
tral  images  is  used  for  a  variety  of  other  civilian  and  military  applications.  These  can  be  categorized 
into  4  main  tasks  [76]: 
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•  Target  detection:  searching  the  pixels  of  a  hyperspectral  image  for  “rare”  pixels  with  known 
spectral  signatures, 

•  Anomaly  detection:  searching  the  pixels  of  a  hyperspectral  image  for  pixels  whose  spectra 
significantly  differ  from  the  local  background, 

•  Change  detection:  finding  the  “significant”  spectral  changes  over  time  between  two  hyperspec¬ 
tral  images  of  the  same  geographic  region, 

•  Classification:  assigning  a  label  to  each  pixel  of  a  hyperspectral  image. 

In  this  work,  we  are  interested  in  target  detection.  Investigating  the  other  applications  is  a  potential 
future  work.  In  the  following,  we  review  the  use  of  hyperspectral  imaging  for  target  detection. 


2.2.1  Definitions 


A  target  is  defined  as  any  object  or  material  being  sought  in  a  hyperspectral  image.  Targets 
that  occupy  multiple  pixels  are  referred  to  as  multipixel  or  resolved  targets.  The  detection  of  resolved 
targets  can  exploit  spatial  and  spectral  properties  of  the  image.  Subpixel  targets,  on  the  other  hand, 
occupy  only  a  part  of  the  pixel.  The  remaining  part  is  filled  with  one  or  more  materials,  which 
are  collectively  referred  to  as  background  [28].  The  detection  of  subpixel  targets  exploits  spectral 
properties  only.  Hyperspectral  imaging  provides  an  invaluable  tool  for  subpixel  target  detection,  as 
it  can  identify  and  distinguish  between  different  materials  having  different  spectral  signatures. 

Typically,  the  number  of  targets  in  a  scene  is  too  small  to  support  the  estimation  of  the 
statistical  properties  of  the  target  class,  and  pattern  classification  algorithms  that  require  such 
information  are  hence  not  applicable. 

Instead,  the  design  and  evaluation  of  detection  algorithms  can  be  achieved  using  an  area  of 
statistics  known  as  binary  hypothesis  testing.  In  particular,  the  likelihood  ratio  (LR)  test  is  used 
for  many  target  detectors.  It  minimizes  the  risk  associated  with  incorrect  decisions,  and  leads  to 
detectors  that  are  optimum  for  a  wide  range  of  performance  criteria,  including  the  maximization  of 
separation  between  target  and  background  spectra  [28] . 

Given  an  observed  spectrum  x,  we  want  to  choose  between  two  competing  hypotheses: 


H0:  Target  absent 
Hi:  Target  present 


(2.27) 


18 


The  likelihood  ratio  is  the  ratio  of  the  conditional  probability  density  functions  under  the  two 
hypotheses: 


A(x)  = 


p(x|Hr) 


p(x|H0) ' 

If  A(x)  exceeds  a  certain  threshold  77,  then  Hi  is  selected  as  true.  This  suggests  the  detector 
shown  in  figure  2.1.  The  test  pixel  x  gets  mapped  into  a  scalar  y  =  A(x),  referred  to  as 


(2.28) 

structure 

detection 


Detector 
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y  =  A(x) 

Threshold 
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Figure  2.1:  Target  detector  structure. 


statistic ,  which  is  compared  to  r)  to  decide  whether  a  target  is  present. 

The  choice  of  rj  controls  the  number  of  correct  detections  and  the  number  of  detection  errors 
(target  misses  and  false  alarms).  This  is  illustrated  in  figure  2.2. 

Threshold 


Missed  targets  False  alarms 

Figure  2.2:  Threshold  selection  trade-offs.1 


There  is  a  compromise  between  choosing  a  low  threshold  to  increase  the  probability  of  detection  Pjj  , 
and  a  high  threshold  to  keep  the  probability  of  false  alarms  Pfa  low.  The  trade-off  between  P]j 

1This  image  was  taken  from  [28]. 
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and  Pfa  is  described  by  the  receiver  operating  characteristic  (ROC)  curve,  which  plots  Pd  versus 
Pfa  as  a  function  of  all  possible  values  of  the  threshold  rj.  Therefore,  ROC  curves  provide  a  means 
to  evaluate  the  detector  performance  or  compare  detectors  independently  of  threshold  selection. 

In  subpixel  target  detection,  the  spectrum  of  the  target  is  mixed  with  the  spectra  of  the 
background.  Finding  a  good  model  for  the  background  is  key  for  detection  [28].  Different  models 
lead  to  different  detectors.  If  a  statistical  distribution  is  used  to  model  the  background,  the  model 
is  said  to  be  unstructured  [28].  In  contrast,  when  a  subspace  is  used  to  model  the  background,  the 
model  is  said  to  be  structured  [28] .  In  the  following,  we  describe  few  subpixel  target  detectors  based 
on  each  kind  of  the  background  models,  with  more  emphasis  on  the  structured  background  model. 


2.2.2  Target  detection  using  unstructured  background  models 


In  this  case,  the  background  is  modeled  using  a  multivariate  normal  distribution  with  zero 
mean  and  covariance  matrix  T.  This  assumes  that  the  background  is  homogeneous.  Due  to  the 
zero  mean  background  assumption,  the  sample  mean  of  the  image  should  be  removed  from  all  image 
pixels  and  target  spectra  beforehand. 

A  well  known  unstructured  background  target  detector  is  the  adaptive  coherence/cosine 
estimator  (ACE)  detector  [77]: 


Tace{*)  = 


(str  1xT)2 


(2.29) 


(stf“1sf)(xf  ix71)’ 

where  x  is  the  spectral  signature  of  the  test  pixel,  st  is  the  spectrum  of  the  target,  and  T  is  the 
maximum  likelihood  estimate  of  the  covariance  matrix  T. 

If  we  consider  the  square-root  decomposition  of  T  =  f  sf  i ,  the  transformation  zT  =  T~5xt 
is  called  adaptive  whitening.  The  ACE  detector  can  be  expressed  as: 

(stzT)2 


r^CO-JMDL 


=  cos 


(2.30) 


(stsf)(zzT)  ||st||2||z||2 

where  sf  =  T~^sf  is  the  whitened  target  signature.  This  shows  that,  in  the  whitened  coordinate 
space,  Tace(x)  is  equal  to  the  cosine  square  of  the  angle  6  between  the  test  pixel  and  the  target. 

A  similar  unstructured  target  detector  was  developed  by  E.  J.  Kelly  [78]: 

(stf”1xT)2 


iWx)-  istt-isT){N  +  xt-ixTy 
where  N  denotes  the  number  of  pixels  in  the  image. 


(2.31) 
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2.2.3  Target  detection  using  structured  background  models 


If  a  subspace  is  used  to  model  the  background,  the  detection  problem  involves  choosing 
between  the  following  competing  hypotheses: 


H0:  x  =  pbE  +  w  (Target  absent) 

(2.32) 

Hi:  x  =  ptst  +  PbE  +  w  (Target  present) 

where  st  denotes  the  spectrum  of  the  target  (specified  by  the  user),  the  matrix  E  denotes  the 
background  subspace  (estimated  from  the  data),  the  vector  pb  and  the  scalar  pt  represent  the 
proportions  of  the  background  and  the  target  in  the  test  pixel  x,  and  w  is  an  error  term  assumed 
to  be  normally  distributed  with  zero  mean  and  covariance  Tw  =  <7^,1. 


2. 2. 3.1  Adaptive  Matched  Subspace  Detector  (AMSD) 

In  practice,  the  parameters  pb,  Pt  and  are  unknown.  Using  their  maximum  likelihood 
estimates  (MLE)  results  in  the  generalized  likelihood  ratio  (GLR)  [76]: 


GLR(x)  ^ 


xP„V 

xPs1XT 


d/2 


(2.33) 


where  d  is  the  dimension  of  x, 


Pb  =  I  —  EJ  (EEJ  )-1E 


(2.34) 


is  the  background  orthogonal  projection  error  matrix,  and  Ps-1  is  the  error  matrix  of  the  orthogonal 
projection  on  the  composite  target  and  background  space  S  =  [st;  E],  computed  by  replacing  E  with 
S  in  (2.34).  Notice  that  cr^  is  not  required  for  the  computation  of  GLR  because  it  cancels  out. 

The  GLR  in  equation  (2.33)  is  used  to  design  a  detection  statistic  called  Adaptive  Matched 
Subspace  Detector  (AMSD)  [76]: 

Pamsd(x)  =  b±  •  (2.35) 

xPs  x-1 

In  (2.35),  the  numerator  is  proportional  to  the  residual  of  the  projection  of  x  on  the  background 
space,  while  the  denominator  is  proportional  to  the  residual  of  the  projection  of  x  on  the  combined 
target  and  background  space.  Thus,  the  ratio  becomes  large  when  the  test  pixel  representation  in 
the  combined  target  and  background  space  S  is  better  than  its  representation  in  the  background 
space  E.  In  other  words,  the  larger  the  ratio  in  (2.35)  the  more  likely  that  pixel  x  contains  the 
target. 
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2. 2. 3. 2  Orthogonal  Subspace  Projection  (OSP)  Detector 


The  proportion  pt  of  the  target  st  in  pixel  x  can  be  estimated  as  [76]: 

siPb±xT 


Pt  = 


SfPb  s 


-1  „T  ' 


(2.36) 


where  Pb-1  is  the  same  as  in  equation  (2.34).  The  numerator  in  (2.36)  has  been  proposed  as  a 
detection  statistic  under  the  Orthogonal  Subspace  Projection  (OSP)  name  [76]: 


To sp(x)  =  stPb  xT. 


(2.37) 


The  operation  Pb^x^  removes  from  xT  the  part  which  belongs  to  the  background  E.  The  residual  is 
then  projected  on  the  target  st.  The  larger  this  quantity  is,  the  more  likely  the  test  pixel  x  contains 
the  target. 


2. 2. 3. 3  Hybrid  Subspace  Detector  (HSD) 


Unlike  OSP  and  AMSD,  which  perforin  an  unconstrained  least  squares  estimate  of  the  abun¬ 
dances  using  orthogonal  projection  [76],  the  Hybrid  Subspace  Detector  (HSD)  estimates  these  abun¬ 
dances  using  the  fully  constrained  least  squares  algorithm  [79].  This  assures  the  satisfaction  of  the 
non-negativity  and  sum-to-one  constraints  and  provides  a  meaning  to  the  abundances  [79].  HSD 
uses  the  following  detection  statistic: 


_  (x-pbE)r  1(x  —  pbE)T 
(x  -  psS)r~1(x  -  psS)T  ’ 


(2.38) 


where  E  is  the  background  subspace  matrix,  pb  is  the  estimate  of  the  abundances  of  the  test  pixel 
in  E,S  =  [st;E]  is  the  composite  target  and  background  space,  ps  is  the  estimate  of  the  abundances 
of  the  test  pixel  in  S,  and  T-1  denotes  the  inverse  of  the  covariance  matrix  of  the  background. 

The  OSP,  AMSD,  HSD,  and  other  structured  target  detectors  use  global  methods  to  model 
the  background  subspace.  These  methods  include  eigenvector  decomposition  of  the  data  correlation 
matrix  [76],  and  global  unmixing  algorithms  such  as  ICE  [61],  MVSA  [70],  and  NFINDR  [60].  These 
global  methods  may  not  provide  a  good  description  of  the  hyperspectral  data  when  the  scene  contains 
multiple  distinct  regions  that  do  not  share  common  materials.  In  this  case,  unmixing  methods  that 
can  learn  multiple  background  models,  that  correspond  to  regions  with  different  characteristics,  are 
more  appropriate.  Moreover,  most  global  unmixing  methods  face  the  challenge  of  target  leakage 
into  the  background,  which  is  the  contribution  of  the  target  pixels  to  the  background  model,  due  to 
their  presence  in  the  scene. 
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CHAPTER  3 


CONTEXT  DEPENDENT  SPECTRAL  UNMIXING 

In  this  chapter,  we  introduce  our  approach,  called  Context  Dependent  Spectral  Unmixing 
(CDSU),  to  hyperspectral  unmixing.  It  finds  multiple  sets  of  endmembers  and  takes  into  account 
the  distribution  of  the  data  in  the  spectral  space  while  unmixing.  First,  we  present  the  motiva¬ 
tions  behind  this  approach.  Then,  we  propose  a  novel  objective  function  and  derive  the  necessary 
conditions  to  optimize  it. 

3.1  Motivations 

To  motivate  our  proposed  approach,  we  present  three  examples  of  data  sets  and  analyze  the 
performance  of  the  previously  presented  ICE  and  P-COMMEND  algorithms  on  them.  The  data  sets 
are  chosen  to  be  2-dimensional  for  easier  visualization. 

3.1.1  Example  1 

In  this  example,  we  consider  a  2-dimensional  data  set  of  250  points  generated  using  the 
convex  geometry  model  in  (1.1).  A  zero  error  term  is  considered.  One  set  of  three  endmembers  is 
used: 


The  proportions  are  chosen  randomly  from  a  standard  uniform  distribution  and  are  normalized  to 
sum  to  one.  The  data  is  illustrated  in  figure  3.1(a).  Blue  points  denote  the  data  vectors  and  green 
points  denote  the  true  endmembers  used  to  generate  the  data  set. 

The  result  of  the  ICE  algorithm  on  this  data  set,  using  M  =  3  and  [i  =  0.0009,  is  shown  in  figure 
3.1(b).  The  detected  endmembers  are  shown  in  red.  The  result  of  the  P-COMMEND  algorithm, 
using  C  =  1,  M  =  3,  m  =  2,  and  a  =  0.01,  is  shown  in  figure  3.1(c).  The  detected  endmembers  are 
shown  in  red. 

As  it  can  be  seen,  for  this  simple  example,  both  ICE  and  P-COMMEND  succeeded  in  identifying 
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Figure  3.1:  Example  1:  (a)  Synthetic  two-dimensional  data  with  one  convex  hull,  (b)  Endmembers 
detected  by  the  ICE  algorithm,  (c)  Endmembers  detected  by  the  P-COMMEND  algorithm. 


endmembers  at  the  vertices  of  the  convex  hull  enclosing  the  data  points  providing  a  tight  fit  around 
them. 


3.1.2  Example  2 


In  this  example,  we  consider  a  2-dimensional  data  set  of  500  points  generated  using  the 
convex  geometry  model  in  (1.1)  with  a  zero  error  term.  For  this  example,  we  use  two  sets  of 
endmembers.  Each  set  has  three  endmembers  and  was  used  to  generate  250  points.  The  first  set  is 
the  same  as  in  (3.1),  and  the  second  one  is: 


Eo  — 


1  1 

2  1 

1.5  2.5 


(3.2) 


Similar  to  example  1,  the  proportions  are  chosen  randomly  from  a  standard  uniform  distribution 
and  are  normalized  to  sum  to  one.  This  data  is  plotted  in  figure  3.2(a),  where  blue  points  denote 
the  data  vectors  and  green  points  denote  the  true  endmembers  used  to  generate  the  data. 

The  results  of  the  ICE  algorithm  on  this  data  set,  using  M  =  3  and  M  =  6,  are  shown  in  figures 
3.2(b)  and  (c)  respectively.  A  value  of  /r  =  0.0009  was  used  for  both  cases.  The  detected  endmembers 
are  shown  in  red.  As  it  can  be  seen,  the  ICE  algorithm  failed  to  identify  correct  endmembers  for  this 
data  set.  This  is  due  to  the  fact  that  ICE  assumes  a  single  convex  region  for  the  entire  data,  and 
this  assumption  is  not  valid  for  this  data  set.  Consequently,  the  detected  erroneous  endmembers 
and  the  associated  abundances  will  have  a  negative  impact  on  any  further  processing  or  analysis 
based  on  them. 

The  result  of  the  P-COMMEND  algorithm,  using  C  =  2,  M  =  3,  m  =  2,  and  a  =  0.01,  is  shown  in 
figure  3.2(d).  The  detected  endmembers  are  shown  in  red  for  one  cluster  and  in  green  for  the  other 


24 


one.  As  it  can  be  seen,  P-COMMEND  succeeded  in  identifying  endmember  sets  at  the  vertices  of 


(c)  (d) 

Figure  3.2:  Example  2:  (a)  Synthetic  two-dimensional  data  with  two  convex  hulls,  (b)  Endmembers 
detected  by  the  ICE  algorithm  with  M  =  3.  (c)  Endmembers  detected  by  the  ICE  algorithm  with 
M  =  6.  (d)  Endmembers  detected  by  the  P-COMMEND  algorithm. 


the  convex  hulls  enclosing  the  data  points  and  thus,  providing  a  tight  fit  around  them. 


3.1.3  Example  3 

In  this  example,  we  consider  a  2-dimensional  data  set  of  1000  points,  arranged  so  that  they 
form  three  clusters.  These  are  illustrated  in  figure  3.3(a),  where  clusters  are  represented  using 
different  colors. 

We  run  the  P-COMMEND  algorithm  on  this  data  set,  using  C  =  3,  M  =  3,  m  =  2,  a  =  0.11 
and  using  the  true  cluster  assignment  in  figure  3.3(a)  as  the  initial  membership  values  U.  We  use 
this  initialization  to  illustrate  the  fact  that  the  results  of  P-COMMEND  are  not  due  to  convergence 
to  local  minima.  As  for  the  initialization  of  the  endmember  sets  E,,  we  pick  three  random  points 

1  Other  values  of  a  can  lead  to  better  results 
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(a)  (b) 

Figure  3.3:  Example  3:  (a)  Synthetic  two-dimensional  data  with  three  convex  hulls  (colors  represent 
true  cluster  assignments),  (b)  Result  of  the  P-COMMEND  algorithm. 


from  each  cluster.  The  result  after  convergence  is  illustrated  in  figure  3.3(b),  where  clusters  are 
represented  using  different  colors,  and  the  endmembers  for  each  cluster  are  represented  using  bolder 
points.  The  assignment  of  a  point  to  a  cluster  is  based  on  its  highest  membership  value.  As  it 
can  be  seen,  P-COMMEND  diverged  from  the  initial  cluster  assignments  and  led  to  clusters  that 
do  not  match  the  distribution  of  the  data.  These  clusters  resulted  from  the  convex  geometry  of 
the  estimated  endmembers.  Consequently,  the  detected  erroneous  endmembers  and  the  associated 
abundances  will  have  a  negative  impact  on  any  further  processing  or  analysis  based  on  them. 

Based  on  the  above  three  examples,  the  motivation  behind  our  approach  is  two-fold.  First, 
there  is  a  need  for  an  unmixing  algorithm  that  is  able  to  find  multiple  sets  of  endmembers  for  an 
input  hyperspectral  data  and  not  only  a  unique  set.  In  general,  multiple  sets  would  provide  a  better 
description  of  the  hyperspectral  scene.  This  is  because  hyperspectral  images  tend  to  be  large  and 
may  contain  multiple  distinct  regions  that  do  not  share  common  materials,  and  hence  need  multiple 
sets  of  endmembers  to  fully  describe  it.  Second,  there  is  a  need  to  find  sets  of  endmembers  that 
represent  semantically  meaningful  regions  of  the  hyperspectral  image.  In  other  words,  we  need  to 
take  into  account  the  distribution  of  the  data  in  the  spectral  space  while  unmixing. 

The  idea  of  the  proposed  approach  is  based  on  context-based  processing  which  has  been 
shown  to  be  important  in  many  signal  processing  applications  [80-82].  In  [80,81],  the  authors 
presented  a  fusion  method,  called  Context-Dependent  Multi-Sensor  Fusion  (CDMSF),  which  fuses 
the  results  of  multiple  landmine  detection  algorithms  that  use  different  types  of  features,  different 
classification  methods,  and  different  sensors.  The  approach  was  motivated  by  the  fact  that  the 
relative  performance  of  different  detectors  can  vary  significantly  depending  on  the  sensor,  mine 


type,  geographical  site,  soil  and  weather  conditions,  and  burial  depth. 

In  [82],  the  authors  point  out  that  CDMSF  [80,81]  treats  the  partitioning  of  the  feature  space  and 
the  selection  of  local  expert  classifiers  as  two  independent  processes  performed  sequentially.  They 
claim  that  these  two  tasks  are  not  independent,  and  their  optimization  should  be  combined.  To 
alleviate  this  limitation,  they  proposed  a  generic  framework  for  context-dependent  fusion,  called 
Context  Extraction  for  Local  Fusion  (CELF)  [82],  that  jointly  optimizes  the  partitioning  of  the 
feature  space  and  the  fusion  of  the  classifiers.  They  defined  a  novel  objective  function  that,  when 
optimized,  produces  contexts  via  unsupervised  clustering  while  simultaneously  providing  optimal 
fusion  parameters  for  each  context.  The  authors  have  shown  that  CELF  can  identify  meaningful 
and  coherent  clusters  where  different  expert  algorithms  can  be  identified.  Their  experiments  have 
also  indicated  that  their  proposed  fusion  approach  outperforms  all  individual  detectors. 

Motivated  by  the  above  context  dependent  processing,  our  proposed  unmixing  approach 
adapts  the  unmixing  to  different  regions  of  the  spectral  space. 


3.2  Context  Dependent  Spectral  Unmixing 


The  Context  Dependent  Spectral  Unmixng  (CDSU)  algorithm  is  a  local  approach  that  adapts 
the  unmixing  to  different  regions  of  the  spectral  space.  It  is  based  on  a  novel  objective  function  that 
combines  context  identification  and  spectral  unmixing  into  a  joint  function.  This  objective  function 
models  contexts  as  compact  clusters  and  uses  the  linear  mixing  model  as  the  basis  for  unmixing. 
The  unmixing  provides  optimal  endmembers  and  abundances  for  each  context. 

In  the  following,  we  assume  that  we  have  N  spectral  signatures,  X  =  {xj  £  Mr,j  =  1, ...,  N}, 
obtained  from  a  hyperspectral  scene,  each  having  d  spectral  bands.  CDSU  combines  context  identi¬ 
fication  and  spectral  unmixing  into  a  joint  objective  function.  It  takes  into  account  the  distribution 
of  the  data  in  the  spectral  space  when  finding  the  regions  or  contexts,  and  not  only  the  convex 
geometry  of  the  spectral  unmixing  as  in  [49].  CDSU  achieves  these  two  tasks  by  minimizing  the 
following  objective  function: 
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and 


P ij  >  0,  and  lixMpfj  =  1  ,Vz,j.  (3.5) 

In  the  above,  Xj  is  a  1  x  d  vector  representing  the  jth  pixel  spectrum,  C  is  a  user-specified 
constant  that  represents  the  number  of  contexts  to  be  extracted,  and  M  is  the  number  of  endmembers 
for  each  context.  In  (3.3),  p,:j  is  a  1  x  M  vector  representing  the  proportion  values  for  pixel  j  with 
respect  to  the  zth  context,  and  Ei  is  a  M  x  d  matrix  such  that  each  row  of  Ej,  e.j*,,  is  the  1  x  d 
vector  representing  the  kth  endmember  in  the  zth  context.  The  membership  values,  utj ,  indicate  the 
fuzzy  degree  to  which  the  jth  sample  belongs  to  the  zth  context.  Finally,  a  and  (3  =  [/3i,  ...,/?c]  are 
positive  constants  used  to  balance  the  three  terms  of  the  objective  function. 

The  first  term  in  (3.3)  is  an  unsupervised  learning  component.  It  is  the  sum  of  intra¬ 
cluster  distances  and  is  the  objective  function  used  in  the  Fuzzy  C- Means  (FCM)  algorithm  [72]. 
It  seeks  to  partition  the  N  samples  into  C  clusters,  and  represent  each  cluster  by  a  center  c i. 
Each  sample  Xj  will  be  assigned  to  each  cluster  z  with  a  membership  degree  Uij.  In  this  term, 
to  €  (1,  Too)  is  used  to  control  the  degree  of  fuzziness  [72].  The  second  and  third  terms  in  (3.3) 
represent  the  spectral  unmixing  component.  This  component  attempts  to  learn  cluster-dependent 
endmembers  and  abundances.  The  second  term  is  the  residual  sum  of  squares  (RSS)  between 
actual  pixels  and  their  estimates  from  the  endmembers  and  abundances.  The  third  term  is  the  sum 
of  squared  distances  (SSD)  between  each  pair  of  endmembers  in  an  endmember  set,  representing 
an  approximation  of  the  volume  enclosed  by  these  endmembers.  Note  that  these  last  two  terms 
represent  the  objective  function  of  the  P-COMMEND  algorithm  [49]  with  the  exception  that  we  use 
C  balancing  parameters  /3,  (z  =  1,  ...,(7),  one  for  each  context,  unlike  P-COMMEND  which  uses  a 
unique  balancing  parameter  a.  This  allows  for  more  flexibility  since  the  simplexes  in  all  contexts 
may  not  be  of  equal  volumes. 

When  all  terms  are  combined  and  the  parameters  a  and  /3  are  chosen  properly,  the  algorithm 
seeks  to  partition  the  data  into  compact  clusters  while  learning  the  endmembers  and  abundances 
for  each  cluster. 

Using  the  matrix  notation 

M-l  M 

y  (eik  -  e,;i)(e.;fc  -  e.;/)T  =  Mtrace(EtEf)  -  lixMEjEf lMxi,  (3.6) 
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the  objective  function  in  (3.3)  can  be  rewritten  as 

C  N 
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The  goal  is  to  identify  the  optimal  endmember  sets  E,  =  {e,;fc,fc  =  1  i  =  1  the 

proportion  sets  P;  =  {Pij,j  =  1  ,  ...,1V},  i  =  1  ,  ...,C,  the  memberships  U  =  [a?::/]i=i,...jc,;j=i,...,JV  and 

the  centers  c»,  i  =  1,  ...,(7,  that  minimize  (3.7)  subject  to  (3.4)  and  (3.5). 

Lagrange  multipliers  [83]  is  a  powerful  tool  for  solving  optimization  problems  without  the 
need  to  explicitly  solve  the  constraints  and  use  them  to  eliminate  extra  variables.  For  the  CDSU 
optimization,  we  incorporate  the  constraints  in  (3.4)  and  (3.5)  into  the  objective  function  in  (3.7) 
using  Lagrange  multipliers  and  obtain 
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where  A  =  [Ai,  ...,  Xn]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  N  constraints  on 
the  memberships  in  (3.4),  T  =  [711,  ...,  7 cn ]>  and  S  =  [^n,  ...,  £cn]  are  vectors  of  Lagrange 
multipliers  corresponding  to  the  C  x  N  constraints  on  the  proportions  p,:j  in  (3.5). 

The  first-order  necessary  conditions  theorem  in  [83]  states  that  a  local  minimizer  of  the 
objective  function  J  necessarily  satisfies  the  Karush-Kuhn- Tucker  (KKT)  conditions.  The  first- 
order  derivative  of  the  Lagrangian  L  with  respect  to  the  minimizing  variable  evaluated  at  the  local 
minimizer  is  necessary  equal  to  zero.  Furthermore,  the  second-order  sufficient  conditions  theorem 
in  [83]  states  that  for  a  feasible  variable,  satisfying  the  KKT  conditions,  to  be  a  local  minimizer  of 
J,  it  is  sufficient  that  the  second-order  derivative  of  the  Lagrangian  L  with  respect  to  this  variable 
is  positive  definite. 

Theorem  3.2.1,  the  proof  of  which  is  given  in  Appendix  A,  gives  the  update  equations  for 
the  endmember  sets,  the  proportion  sets,  the  memberships,  and  the  centers  that  minimize  J. 


Theorem  3.2.1.  The  first  and  second  order  conditions  yield  the  following  local  minimizers  of  J : 
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By  examining  (3.11),  we  notice  that  a  pixel  j  will  have  a  high  membership  in  cluster  i  if: 
(i)  its  spectra  is  close  to  the  centroid,  c,,  of  that  cluster  in  the  feature  space;  and  (ii)  it  fits  the 
endmember  model  of  that  cluster  (i.e.  small  error  of  fit). 

CDSU  is  an  iterative  algorithm  that  involves  successive  updates  of  the  endmember  proportion 
sets  Pj,  the  memberships  U,  the  endmember  sets  Ej,  and  the  clusters’  centers  c It  is  summarized 
in  Algorithm  3.1. 


Algorithm  3.1  Context  Dependent  Spectral  Unmixing  (CDSU) 


Inputs:  X:  the  data  points  ( N  x  d ). 

C:  the  number  of  contexts. 

M:  the  number  of  endmembers  for  each  context, 
m:  the  fuzzifier,  m  £  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 

(3 :  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples, 
c, :  the  cluster  centers. 

Ej:  the  sets  of  endmembers  in  all  clusters. 

P.p  the  sets  of  proportions  in  all  clusters. 

Initialize  Cj  and  E^. 

repeat 

Update  P,;  using  (3.10). 

Update  U  using  (3.11). 

Update  Ej  using  (3.9). 

Update  Cj  using  (3.12). 

until  parameters  do  not  change  significatively 

return  U,  Cj,  Ej,  Pj 


The  CDSU  algorithm  is  of  linear  time  per  iteration.  It  runs  in  0{C  x  M  x  N)  time  per 
iteration,  where  C  is  the  number  of  contexts,  M  is  the  number  of  endmembers  per  context,  and 
N  is  the  number  of  data  points.  In  the  current  implementation  of  the  algorithm,  the  centers  are 
initialized  using  the  Fuzzy  C-Means  algorithm  [72]  (which  is,  in  turn,  randomly  initialized),  and  the 
endmember  sets  are  initialized  using  the  Minimum  Volume  Simplex  Analysis  (MVSA)  algorithm  [70]. 
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Furthermore,  the  convergence  of  the  algorithm  is  checked  by  comparing  the  values  of  the  objective 
function  from  successive  iterations.  If  the  difference  is  below  some  threshold,  the  algorithm  is 
stopped. 


3.3  Context  Dependent  Spectral  Unmixing  Using  the  Mahalanobis  Distance 


The  current  CDSU  algorithm  uses  the  Euclidean  distance  in  the  clustering  component  (first 
term  in  (3.7)).  As  a  result,  it  is  restricted  to  identifying  spherical  clusters.  Moreover,  clustering 
in  a  high  dimensional  space,  as  is  the  case  of  hyperspectral  data,  is  a  challenging  task.  Features 
might  not  be  equally  important  and  some  of  them  can  be  highly  correlated.  This  may  result  in 
non-spherical  clusters. 

In  this  section,  we  extend  CDSU  to  use  the  Mahalanobis  distance  instead  of  the  Euclidean 
distance.  This  would  allow  more  flexibility  in  the  shape  of  the  clusters  instead  of  the  traditional 
spherical  shape. 

Using  the  same  notation  as  in  Section  3.2,  the  objective  function  becomes 
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subject  to  (3.4),  (3.5),  and 


det(Ai)  =  <ji  Mi. 


(3.14) 


In  (3.13),  A i  is  a  symmetric  and  positive  dehnite  matrix.  Fixing  the  determinant  of  the 
norm  matrix  A,;  to  a  positive  constant  cr,;  allows  the  algorithm  to  search  for  a  cluster  shape  that  fits 
the  data  while  preserving  the  volume  of  the  cluster. 

Theorem  3.3.1,  the  proof  of  which  is  given  in  Appendix  B,  gives  the  update  equations  of  the 
endmember  sets,  the  proportions  sets,  the  memberships,  the  centers,  and  the  norm  matrices  that 
minimize  Jm- 


Theorem  3.3.1.  Optimizing  the  objective  function  in  (3.13)  using  the  Lagrange  multipliers  method 
leads  to  the  same  update  equations  for  the  endmember  sets,  the  proportions,  and  the  centers  as  for 
CDSU  using  the  Euclidean  distance  (equations  (3.9),  (3.10)  and  (3.12)  respectively) . 

The  update  equation  for  the  memberships,  u^,  becomes 
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Finally,  the  update  equation  for  the  norm  matrices,  A,,  is 

Aj  =  [aidetfCi)}^  Cf1 ,  (3.16) 

where 

N 

E  uij(xi  ~  ci)T(Xj  -  C i) 

c,  =  — - ^ -  (3.17) 

E^ 

1=1 

is  f/ie  /uzz?/  covariance  matrix  of  cluster  i. 

Observing  (3.15),  we  notice  that  a  pixel  j  will  have  a  high  membership  in  cluster  i  if:  (i)  its 
spectra  has  a  small  Mahalanobis  distance  to  the  centroid,  ct,  of  that  cluster  in  the  feature  space; 
and  (ii)  it  fits  the  endmember  model  of  that  cluster. 

The  norm  matrices  A;  in  (3.16)  are  a  function  of  the  inverse  of  the  fuzzy  covariance  matrices 
C,; .  This  allows  CDSU  to  account  for  the  shape  of  the  data  by  normalizing  by  the  variances  of  each 
dimension. 

The  resulting  Context  Dependent  Spectral  Unmixing  using  the  Mahalanobis  distance  (CDSUm) 
is  an  iterative  algorithm  that  uses  alternating  optimization.  It  involves  successive  updates  of  the 
endmember  proportions  p,j,  the  memberships  it^,  the  endmember  sets  E,;,  the  cluster  centers  C;, 
and  the  norm  matrices  A,  ,  until  stabilization.  It  is  summarized  in  Algorithm  3.2. 

3.4  Semi-supervised  Context  Dependent  Spectral  Unmixing 

Spectral  unmixing  is  a  challenging,  ill-posed,  inverse  problem  that  may  result  in  infinitely 
many  solutions,  most  of  which  are  meaningless.  Moreover,  context  dependent  spectral  unmixing 
is  based  in  part  on  clustering.  However,  clustering  itself  is  a  challenging  task,  especially  in  a  high 
dimensional  feature  space  as  is  the  case  of  hyperspectral  data.  Many  local  minima  solutions  may 
exist. 

To  overcome  this  problem,  we  propose  two  semi-supervised  algorithms  of  CDSUm-  The 
approaches  use  partial  supervision  to  help  guide  the  search  process  and  narrow  the  space  of  possible 
solutions.  The  supervision  information  consists  of  a  small  set  of  pairwise  constraints  which  can  be 
obtained  from  multiple  sources  of  information. 

The  first  algorithm,  called  Cluster  Constrained  Multi-Model  Unmixing  (CC-MMU)  [55], 
uses  constraints  on  the  cluster  assignments  of  the  pixels.  The  second  algorithm,  called  Proportion 
Constrained  Multi-Model  Unmixing  (PC-MMU),  uses  constraints  on  the  proportion  values  of  the 
pixels. 
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Algorithm  3.2  Context  Dependent  Spectral  Unmixing  using  the  Mahalanobis  Distance  (CDSUm) 


Inputs:  X:  the  data  points  ( N  x  d). 

C :  the  number  of  contexts. 

M:  the  number  of  endmembers  for  each  context, 
m:  the  fuzzifier,  m  €  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 

(3:  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
a,  >  0,  i  =  1..C:  the  determinants  of  the  norm  matrices. 

Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples, 
c, :  the  cluster  centers. 

E, :  the  sets  of  endmembers  in  all  clusters. 

Pj:  the  sets  of  proportions  in  all  clusters. 

A,;:  the  norm  matrices  for  all  clusters. 

Initialize  c A,;  and  E, . 

repeat 

Update  P,;  using  (3.10). 

Update  U  using  (3.15). 

Update  Ej  using  (3.9). 

Update  c.j  using  (3.12). 

Update  A,  using  (3.16)  and  (3.17). 
until  parameters  do  not  change  significatively 
return  U,  c,:,  Ej,  P,,  A; 


3.4.1  Cluster  Constrained  Multi-Model  Unmixing 


CC-MMU  uses  constraints  on  the  cluster  assignments  of  the  pixels.  We  assume  that  we 
dispose  of  two  sets  of  constraints:  a  set  of  should-link  pairs,  S ,  such  that  (j,  k)  £  S  means  that 
Xj  and  Xfc  should  be  assigned  to  the  same  cluster,  and  a  set  of  should  not-link  pairs,  A f,  such  that 
(j,  k)  £  M  means  that  Xj  and  x^.  should  not  be  assigned  to  the  same  cluster.  Each  should-link  and 
should  not-link  constraint  has  a  violation  cost  pjk •  The  constraints  and  their  costs  can  be  obtained 
by  labeling  few  pixels  in  the  hyperspectral  image  or  by  using  information  extracted  from  a  different 
sensor. 


Following  the  same  notation  as  in  Section  3.3,  the  proposed  CC-MMU  minimizes  the  follow¬ 
ing  objective  function: 

C  N 

Jc(Ej,Pj,U,  Cj,  A;)  =  J]  E  uij(xj  -  Cj)Ai(xi  -  Cj)T 

i=i  i=i 

+t(  E  E  E  •  E  E  "I'/) 


C  r  N 


+a  E  E  uij (xi  -  PijEj)(xi  -  PijEj)T  +  /3i(Mtrace{E,'Ef )  -  1 ,  Y  A;  E,  E '  lMxi)  ,  (3.18) 

i—1  L  j—1  J 

subject  to  the  constraints  in  (3.4),  (3.5)  and  (3.14).  7  is  a  positive  constant. 

In  addition  to  the  terms  of  the  CDSUm  objective  function  described  in  Section  3.3,  the 
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objective  function  of  CC-MMU  contains  a  semi-supervision  term  which  would  equal  to  zero  when 
none  of  the  defined  should-link  and  should  not-link  constraints  are  violated. 

We  optimize  (3.18)  with  respect  to  the  centers  Cj,  the  memberships  ul3 ,  the  norm  matri¬ 
ces  A i,  the  proportions  Pij  and  the  endmembers  Ej,  using  the  Lagrange  multipliers  optimization 
method. 

Theorem  3.4.1,  the  proof  of  which  is  given  in  Appendix  C,  gives  the  update  equations  of  the 
local  minimizers  of  Jc- 


Theorem  3.4.1.  The  update  equations  for  the  endmember  sets,  the  abundances,  the  centers  and 
the  norm  matrices  are  similar  to  the  ones  of  CDSUm  (equations  (3.9),  (3.10),  (3.12)  and  (3.16) 
respectively) . 

The  update  equation  for  the  memberships  becomes 


Uij 


dfj  +  'ycostij  +  afiti 


c  r 

E  d\j  +  7  costqj  +  a  fit 

g=l  L 


(3.19) 


where 

d%  =  (xj  -  c,  •  A , !  x ,  -  a)T,  (3.20) 

c  _ 

costqj  =  y  (  y '  pjkUik  +  y '  pjkuik,  (3. 21) 

U,k)es  (j,k)£Af 

and 

fitij  =  {xj  -  p, ,  K,  )i  x,  -  p,,E;)7  .  (3.22) 


In  (3.19),  d(j  represents  the  scpiared  Mahalanobis  distance  between  pixel  j  and  the  center 
of  cluster  i,  costij  represents  the  cost  of  violating  the  constraints  related  to  pixel  j  with  respect  to 
cluster  i,  and  fitij  represents  the  squared  error  of  fit  of  pixel  j  to  the  convex  model  i.  Examining 
(3.19),  we  notice  that  a  pixel  j  will  have  a  high  membership  in  cluster  i  if:  (i)  its  spectrum  is  close 
to  the  centroid,  c i,  of  that  cluster  in  the  spectral  space;  (ii)  it  does  not  violate  many  constraints 
that  involve  it;  and  (iii)  it  fits  the  model  of  that  cluster. 

The  resulting  CC-MMU  is  an  iterative  algorithm  that  uses  alternating  optimization.  It  in¬ 
volves  successive  updates  of  the  norm  matrices  A, ,  the  endmember  proportions  Py ,  the  memberships 
Uij,  the  endmember  sets  Ej,  and  the  clusters’  centers  Cj,  until  stabilization.  It  is  summarized  in 
Algorithm  3.3. 
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Algorithm  3.3  Cluster  Constrained  Multi-Model  Unmixing  (CC-MMU) 


Inputs:  X:  the  data  points  ( N  x  d). 

C :  the  number  of  contexts. 

M:  the  number  of  endmembers  for  each  context. 
to:  the  fuzzifier,  to  €  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 

(3:  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
a,  >  0,  i  =  1..C:  the  determinants  of  the  norm  matrices. 

7  >  0:  the  weight  of  the  semi-supervised  term. 

S:  the  set  of  should-link  constraints. 

A f:  the  set  of  should  not-link  constraints. 

Pjk  >  0 ,\/(j,k)  €  5  or  J\f:  the  constraints  violation  costs. 

Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples, 
c, :  the  cluster  centers. 

E, :  the  sets  of  endmembers  in  all  clusters. 

Pp  the  sets  of  proportions  in  all  clusters. 

A :  the  norm  matrices  for  all  clusters. 

Initialize  c, ,  A,  and  E;. 

repeat 

Update  P,  using  (3.10). 

Update  U  using  (3.19),  (3.20),  (3.21)  and(3.22). 

Update  Ej  using  (3.9). 

Update  c.j  using  (3.12). 

Update  A,  using  (3.16)  and  (3.17). 
until  parameters  do  not  change  significatively 
return  U,  c,:,  E,:,  P;,  At 


3.4.2  Proportion  Constrained  Multi-Model  Unmixing 


PC-MMU  uses  constraints  on  the  proportion  values  of  the  pixels.  We  assume  that  we  dispose 
of  a  set  of  pixel  pairs,  S ,  such  that  (j.  k)  G  S  means  that  Xj  and  x*,  should  have  similar  proportion 
values  in  the  extracted  endmembers.  Each  constraint  has  a  violation  cost  pjk ■  The  constraints  and 
their  costs  can  be  obtained  through  some  ground  truth  knowledge  or  through  consensus  unmixing 
as  will  be  shown  in  the  next  chapter. 

Following  the  same  notation  as  in  Section  3.3,  the  proposed  PC-MMU  minimizes  the  follow¬ 
ing  objective  function: 
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f(xj  -  p, , E,  )(Xj  -  PijE,)T  -(-  fii(Mtrace(EiEf)  -  lixMEjEf  lMxi)  ,  (3.23) 


2=1  Lj  =  1  J 

subject  to  the  constraints  in  (3.4),  (3.5)  and  (3.14).  7  is  a  positive  constant. 

In  addition  to  the  terms  of  the  CDSUm  objective  function  described  in  Section  3.3,  the 
objective  function  of  PC-MMU  contains  a  semi-supervision  term  which  would  equal  to  zero  when 
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none  of  the  defined  proportion  constraints  are  violated. 

We  optimize  (3.23)  with  respect  to  the  centers  c,;,  the  memberships  tty,  the  norm  matri¬ 
ces  A,,  the  proportions  p and  the  endmembers  Ej,  using  the  Lagrange  multipliers  optimization 
method. 

Theorem  3.4.2,  the  proof  of  which  is  given  in  Appendix  D,  gives  the  update  equations  of  the 
local  minimizers  of  Jp. 


Theorem  3.4.2.  The  update  equations  for  the  endmember  sets,  the  memberships,  the  centers  and 
the  norm  matrices  are  similar  to  the  ones  of  CDSUm  (equations  (3.9),  (3.15),  (3.12)  and  (3.16) 
respectively) . 

The  update  equation  for  the  proportions  becomes 
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The  resulting  PC-MMU  is  an  iterative  algorithm  that  uses  alternating  optimization.  It 
involves  successive  updates  of  the  norm  matrices  A, ,  the  proportions  p,j ,  the  memberships  Uij ,  the 
endmember  sets  E,,  and  the  clusters’  centers  ci;  until  stabilization.  It  is  summarized  in  Algorithm 
3.4. 


3.5  Robust  Context  Dependent  Spectral  Unmixing 


One  limitation  of  most  multiple  model  unmixing  methods  is  their  sensitivity  to  noise  and 
outliers  due  to  scene  and/or  sensor  effects  [84].  This  limitation  is  inherited  from  the  fuzzy  clustering 
approach  used  to  learn  the  multiple  convex  sets.  Noise  points  affect  not  only  the  convex  sets,  but 
also  the  estimates  of  the  endmembers  and  proportions  within  each  set. 

Possibilistic  clustering  [85]  has  been  proposed  to  overcome  the  sensitivity  of  fuzzy  clustering 
to  noise.  This  approach  uses  possibilistic  membership  functions  to  identify  and  reduce  the  effect  of 
noise  points.  Possibilistic  memberships  may  however  result  in  identical  clusters.  Recent  approaches 
combine  fuzzy  and  possibilistic  memberships  to  avoid  such  problems  [53]. 

In  this  section,  we  propose  using  both  fuzzy  and  possibilistic  membership  functions  to  develop 
a  robust  multi-model  unmixing  algorithm.  Fuzzy  memberships  are  used  to  partition  the  spectra  space 
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Algorithm  3.4  Proportion  Constrained  Multi-Model  Unmixing  (PC-MMU) 


Inputs:  X:  the  data  points  ( N  x  d). 

C :  the  number  of  contexts. 

M:  the  number  of  endnrembers  for  each  context. 
to:  the  fuzzifier,  to  €  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 

(3:  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
a,  >  0.  i  =  1..C:  the  determinants  of  the  norm  matrices. 

7  >  0:  the  weight  of  the  semi-supervised  term. 

S:  the  set  of  proportion  constraints. 

Pjk  >  0 €  S:  the  constraints  violation  costs. 

Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples. 
cp  the  cluster  centers. 

E, :  the  sets  of  endmembers  in  all  clusters. 

P.;:  the  sets  of  proportions  in  all  clusters. 

A i\  the  norm  matrices  for  all  clusters. 

Initialize  c i,  A i  and  E;. 

repeat 

Update  P,  using  (3.24). 

Update  U  using  (3.15). 

Update  Ej  using  (3.9). 

Update  c.j  using  (3.12). 

Update  A,  using  (3.16)  and  (3.17). 
until  parameters  do  not  change  significatively 
return  U,  c,:,  E,:,  P;,  At 


into  multiple  convex  sets  that  span  the  entire  space  and  avoid  coincident  clusters  [72].  Possibilistic 
memberships  are  used  to  reduce  the  effect  of  noise  and  obtain  robust  estimates  of  the  endmembers 
and  proportions  within  each  cluster. 

The  Robust  Context  Dependent  Spectral  Unmixing  (RCDSU)  combines  fuzzy  and  possibilis¬ 
tic  clustering  with  linear  unmixing  into  a  joint  objective  function.  The  clustering  component  is  used 
to  partition  the  data  into  multiple  contexts.  The  linear  unnrixing  component  learns  endmembers  and 
abundances  within  each  context.  Following  the  same  notation  as  in  Section  3.3,  RCDSU  minimizes 

C  N  C  N 

Jr{ Ei,  P;,  U,  T,  a,  A i)  =  J2  J2  (auij  +  -  c,)Ai(xj  -  a)T  +  J2ViJ2( 1  ~  Uj)n 

i=l  j=l  i= 1  j= 1 

C  r  N  1 

+«E  J2  (auij  +  —  pijEi)(x.j  —  p,jEi)T  +  Pi(Mtrace(Ef  E;)  —  liXA/EiEf  Imxi)  ,  (3.26) 

i= 1  L  j=l  J 

subject  to  the  constraints  in  (3.4),  (3.5),  (3.14),  and 

Uj  G  [0,l],V*,j.  (3.27) 

The  first  and  second  terms  in  (3.26)  represent  the  unsupervised  learning  component.  They  seek 
to  partition  the  N  samples  into  C  clusters,  and  represent  each  cluster  by  a  center  c.;  and  a  norm 
matrix  A;.  Each  sample  x?  will  be  assigned  to  each  cluster  i  with  two  types  of  memberships.  The 
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fuzzy  membership  [72],  Uij ,  is  the  degree  with  which  pixel  j  belongs  to  cluster  i;  and  the  possibilistic 
membership  [85],  tij,  is  the  possibility  or  typicality  that  pixel  j  belongs  to  cluster  i.  The  memberships 
are  weighted  by  two  positive  constants,  a  and  b  (with  a  +  b  =  1),  that  control  the  prevalence  of  each 
membership  for  the  task  at  hand.  The  constants  m  and  n  £  (1,  +oo)  are  the  fuzzifiers  [72]  for  both 
memberships.  The  second  term  in  (3.26)  forces  the  possibilistic  memberships,  Uj,  to  be  as  large  as 
possible,  thus  avoiding  the  trivial  solution  of  having  them  all  equal  to  zero. 

The  third  and  fourth  terms  in  (3.26)  represent  the  spectral  unmixing  component  and  are 
similar  to  the  ones  of  CDSU.  Finally,  rj  =  [771,  are  positive  values. 

We  optimize  (3.26)  with  respect  to  all  parameters  using  the  Lagrange  multipliers  optimiza¬ 
tion  method.  Theorem  3.5.1,  the  proof  of  which  is  given  in  Appendix  E,  gives  the  update  equations 
of  the  local  minimizers  of  Jr. 


Theorem  3.5.1.  The  update  equations  for  the  proportions  and  the  memberships  are  similar  to  the 
ones  of  CDSU m  (equations  (3.10)  and  (3.15)  respectively). 

The  update  equation  for  the  endmember  sets,  E* ,  is 
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The  update  equation  for  the  norm  matrices,  A*,  is  the  same  as  in  equation  (3.16),  but  with 

E  -  ci)T(xi  -  c* ) 
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The  update  equation  for  the  typicalities,  tij ,  is  given  by 
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Finally,  the  update  equation  of  the  cluster  centers  c*  is 
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Unlike  the  fuzzy  memberships,  the  possibilistic  memberships  are  not  constrained  to  sum  to 
one.  They  provide  an  intuitive  notion  of  typicality.  Note  that  the  larger  costij  is,  the  smaller  the 
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typicality  of  pixel  j  in  cluster  i.  This  means  that  a  pixel  that  is  far  from  any  cluster  and  that 
does  not  fit  any  convex  model  would  have  small  typicality  values  in  all  clusters,  and  hence  can  be 
considered  as  an  outlier  or  a  noise  point.  The  value  of  rji  determines  when  the  membership  of  a 
point  in  cluster  i  becomes  0.5  (i.e.,  the  3  dB  point).  Thus,  it  needs  to  be  chosen  depending  on  the 
desired  “bandwidth”  of  the  membership  distribution  for  that  cluster. 

The  resulting  RCDSU  is  an  iterative  algorithm  that  uses  alternating  optimization.  It  involves 
successive  updates  of  the  endmember  proportions  p-ij ,  the  fuzzy  memberships  Uij,  the  possibilistic 
memberships  Uj,  the  endmember  sets  E;,  the  cluster  centers  c,;,  and  the  norm  matrices  A,;,  until 
convergence.  It  is  summarized  in  Algorithm  3.5. 

Algorithm  3.5  Robust  Context  Dependent  Spectral  Unmixing  (RCDSU) 


Inputs:  X:  the  data  points  ( N  x  d ). 

C :  the  number  of  contexts. 

M:  the  number  of  endnrenrbers  for  each  context. 
to,  n:  the  fuzzifier,  m,  n  €  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 
f3:  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
a,  >  0,  i  =  1..C:  the  determinants  of  the  norm  matrices, 
a,  b:  weights  of  the  fuzzy  and  possibilistic  memberships  (a  +  b  =  1). 
Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples. 

T:  the  possibilistic  membership  matrix  of  the  data  samples. 
cf.  the  cluster  centers. 

E, :  the  sets  of  endmembers  in  all  clusters. 

P.;:  the  sets  of  proportions  in  all  clusters. 

A,:  the  norm  matrices  for  all  clusters. 

Initialize  c, ,  A i  and  E;. 

repeat 

Update  P,  using  (3.10). 

Update  U  using  (3.15). 

Update  T  using  (3.30)  and  (3.31). 

Update  Ej  using  (3.28). 

Update  c.j  using  (3.32). 

Update  A,  using  (3.16)  and  (3.29). 
until  parameters  do  not  change  significatively 
return  U,  T,  ci;  E,,  P,,  A.( 


3.6  Unsupervised  Robust  Context  Dependent  Spectral  Unmixing 

The  proposed  RCDSU  algorithm,  like  other  multi-model  unmixing  algorithms,  assumes  that 
the  optimal  number  of  endmember  sets  is  known.  However,  this  may  not  be  the  case,  and  it  should 
be  learned  from  the  data.  In  this  section,  we  extend  RCDSU  to  find  the  “optimal”  number  of 
contexts  in  an  unsupervised  way. 
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Our  approach  is  inspired  by  [86]  and  explores  properties  of  the  possibilistic  membership 
functions.  First,  we  overspecify  the  number  of  clusters  C  and  run  the  RCDSU  algorithm.  Second, 
we  ignore  the  fuzzy  memberships  (by  setting  a  to  0  and  b  to  1)  and  run  RCDSU  to  allow  clusters 
to  expand  over  neighboring  regions.  Since  the  possibilistic  memberships  are  not  constrained,  small 
clusters  covering  the  same  dense  regions  would  expand  and  become  similar.  Finally,  we  use  a 
similarity  measure  to  identify  similar  clusters  and  merge  them.  We  use 

N 

X"!  |  tjk  tjk  | 

r - ^ - •  (3-33) 

X]  Uk  +  Xj  tjk 

k=  1  fc=  1 

Sjj  does  not  depend  on  the  distance  measure  explicitly,  and  therefore,  it  can  be  used  independently 
of  the  clusters  shape  and  size.  More  importantly,  it  does  take  into  account  the  model  fitting  error 
(used  implicitly  in  the  memberships).  It  can  be  easily  shown  that  0  <  Sjj  <  1,  where  Stj  =  1  when 
clusters  i  and  j  are  identical,  and  Sl3  =  0  when  the  clusters  are  disjoint. 

The  Unsupervised  RCDSU  (U-RCDSU)  is  summarized  in  Algorithm  3.6.  We  should  point 
out  here  that  even  when  the  number  of  clusters  is  known  a  priori,  it  is  recommended  to  overspecify 
the  number  of  clusters.  This  makes  the  results  less  sensitive  to  initialization  and  gives  tiny  clusters 
(compared  to  other  clusters)  a  better  chance  of  being  detected. 
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Algorithm  3.6  Unsupervised  Robust  Context  Dependent  Spectral  Unmixing  (U-RCDSU) 


Inputs:  X:  the  data  points  ( N  x  d). 

C  =  Cmax'-  the  over-specified  number  of  clusters. 

M:  the  number  of  endmembers  for  each  context, 
m,  n:  the  fuzzifier,  m,  n  €  (l,+oo). 

a:  the  weight  of  the  second  term  in  the  objective  function. 
f3:  the  vector  of  weights  in  the  second  term  of  the  objective  function  (1  x  C). 
<jj  >  0,  i  =  1..C:  the  determinants  of  the  norm  matrices, 
a,  b:  weights  of  the  fuzzy  and  possibilistic  memberships  (a  +  b  =  1). 
Outputs:  U:  the  fuzzy  membership  matrix  of  the  data  samples. 

T:  the  possibilistic  membership  matrix  of  the  data  samples. 
cf.  the  cluster  centers. 

E, :  the  sets  of  endmembers  in  all  clusters. 

P.;:  the  sets  of  proportions  in  all  clusters. 

A,;:  the  norm  matrices  for  all  clusters. 

Initialize  c, ,  A,  and  E, . 

TJ.T.c,.  A,.K,.P,j  =  RCDSU(Ci,  Ai.Ei). 

Set  a  to  0,  b  to  1. 

repeat 

[U,  T,  Ci,  Aj,  Ej,  =  RCDSU(ci,  A,,  E,). 
for  each  pair  of  clusters  i  and  j  do 
Compute  Sij  using  (3.33). 
if  Sij  >  1  —  e  then 
Merge  clusters  i  and  j. 
end  if 
end  for 

if  a  merging  occured  then 
Update  C,  c,;  and  A;. 

Initialize  Ej. 

end  if 

until  no  merging  takes  place 
Reset  a  and  b. 

[U,  T,  c  j ,  A  * ,  E  * ,  Pi]  =  RCDSU(ci,  Au  E,). 

return  U,  T,  Cj,  E,,  PJ:  A t 
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CHAPTER  4 


ROBUST  UNMIXING  USING  CONSENSUS  ANALYSIS 

Spectral  unmixing  is  a  challenging,  ill-posed,  inverse  problem.  Many  algorithms  have  been 
proposed  for  robust,  stable,  and  accurate  unmixing  solutions.  Different  algorithms  have  different 
modes  of  operation  and  usually  yield  different  results.  Moreover,  most  of  them  require  specifying 
the  number  of  endmembers  to  be  extracted  before  hand.  In  Chapter  3,  we  have  proposed  various 
algorithms  that  can  identify  endmembers  while  taking  the  distribution  of  the  data  into  account.  In 
this  chapter,  we  propose  using  consensus  analysis  on  multiple  unmixing  results  to  find  the  “optimal” 
endmembers  in  the  data.  We  run  different  unmixing  algorithms,  using  different  numbers  of  end- 
members,  and  combine  the  results  using  consensus  analysis.  The  claim  is  that  actual  endmembers 
will  have  a  consensus  among  all  runs.  This  chapter  is  organized  as  follows.  First,  we  present  the 
motivation  behind  the  proposed  idea.  Then,  we  outline  the  proposed  consensus  unmixing  approach. 

4.1  Motivations 

Many  unmixing  algorithms  exist  in  the  literature  [29]  and  different  algorithms  often  result 
in  different  endmembers  for  the  same  data.  Moreover,  the  same  algorithm  may  not  result  in  the 
same  endmembers  when  run  multiple  times.  This  is  mainly  due  to  the  non-deterministic  behavior 
of  the  algorithm.  In  fact,  each  method  has  its  own  assumptions  and  approach  for  estimating  the 
endmembers.  The  goal  is  to  take  advantage  of  this  diversity  to  estimate  the  “optimal”  endmembers 
using  consensus  analysis. 

The  idea  has  its  roots  in  consensus  clustering  [57],  where  different  mechanisms  are  proposed 
to  explore  the  consensus  between  multiple  data  partitions.  The  idea  of  consensus  analysis  has 
been  investigated  in  [87]  to  estimate  the  number  of  endmembers.  However,  it  did  not  take  into 
account  different  unmixing  algorithms.  In  fact,  a  single  algorithm  is  run  multiple  times  using  the 
same  number  of  endmembers,  and  the  goal  is  to  find  the  number  that  achieves  the  most  stable 
classification  among  all  runs.  In  this  chapter,  we  investigate  using  multiple  algorithms  with  multiple 
parameter  settings  to  find  an  accurate  and  consistent  set  of  endmembers  in  the  data.  The  claim  is 
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Figure  4.1:  First  3  PC  of  the  data  (blue),  endmembers  from  all  unmixing  runs  (red),  and  true 
endmembers  (green). 

that  “real”  endmembers  will  have  a  consensus  among  multiple  runs. 

To  illustrate  the  diversity  of  different  unmixing  results,  we  use  a  simulated  hyperspectral  data 
of  2000  points  generated  according  to  equation  (1.1),  using  4  endmembers  ( Spessartine ,  Halloysite, 
Chlorite  and  Lizardite)  from  the  USGS  digital  spectral  library  [88].  The  proportions  are  linearly 
generated  between  0  and  0.9  for  each  endmember.  A  zero-mean  Gaussian  noise  at  a  signal-to-noise 
ratio  (SNR)  of  20  dB  was  added  to  the  data.  For  this  example,  we  run  ICE  [61],  VCA  [66],  PPI  [59], 
and  N-FINDR  [60]  using  a  number  of  endmembers  varying  from  3  to  7,  repeating  each  run  4  times 
(giving  a  total  of  80  runs).  We  also  run  U-RCDSU  using  Cmax  =  5  with  M  =  2,  3,  and  4  (we  let 
m  =  n  =  1.5,  a  =  b  =  0.5,  a  =  100,  fa  =  4  Vi,  cq  =  1  Vi,  and  e  =  0.2),  leading  to  1  context  every 
time  with  2,  3  and  4  endmembers  each.  In  total,  we  have  83  endmember  sets. 

In  figure  4.1,  we  scatter  plot  the  first  3  principal  components  (PC)  of  the  data  in  small 
blue  dots,  the  resulting  endmembers  from  all  runs  in  large  red  dots,  and  the  true  endmembers  in 
larger  green  dots.  Since  most  unmixing  methods  are  expected  to  identify  reasonable  endmembers 
on  most  runs,  we  observe  a  concentration  of  estimated  endmembers  around  the  true  endmembers. 
This  observation  is  explored  to  develop  an  approach  for  robust  endmembers  estimation. 

4.2  Consensus  Unmixing 

Let  X  =  {xj  £  Rd,  j  =  1,  be  the  N  spectral  signatures  obtained  from  a  hyperspectral 

image  having  d  spectral  bands.  Let  T  be  the  total  number  of  unmixings  on  X.  The  T  unmixing 
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results  could  be  obtained  by  running  different  unmixing  algorithms,  one  unmixing  algorithm  with 
different  parameters/initializations,  or  a  combination  of  the  two.  Let  E,;  =  {e^/c  =  1  ,  be 

the  set  of  Mj  endmembers  resulting  from  unmixing  i,  i  =  1, T.  Let  Pj  =  {pjik,  j  =  1,  •  ••,  N;  k  = 
be  the  set  of  proportions,  Pjik,  of  endmember  k  in  pixel  j  resulting  from  unmixing  i, 
i  =  l,...,T. 

The  first  step  in  our  approach  is  to  select  a  subset,  kL ,  of  data  points  that  are  more  likely  to 
inform  us  about  the  best  locations  of  the  endmembers.  For  each  endmember  e,;*,,  we  form  an  ap-cut 
set 

(P ik)ap  =  {j I  Pjik  >  ap}.  (4.1) 

(p ik)ap  contains  the  indices  of  the  points  having  proportion  values,  in  endmember  e^,  that  are 
larger  than  ap. 

For  computational  efficiency,  we  try  to  keep  kL  as  small  as  possible  while  maintaining  di¬ 
versity.  Thus,  if  |(Ptfe)Qp|  >  Q,  we  select  the  top  Q  points  having  the  largest  proportion  values. 
Otherwise,  we  consider  the  entire  set.  We  call  these  points  with  high  proportions  the  voters  to  e,;,.. 
The  subset  kL  is  the  union  of  the  voters  to  all  endmembers  in  all  runs: 

n  =  \J(Plk)ap.  (4.2) 

i,k 

The  second  step  of  our  approach  aims  to  reduce  the  size  of  kL  and  keep  only  consistent  voters. 
In  fact,  kL  may  include  voters  to  endmembers  that  are  not  consistent.  In  order  to  filter  those  out,  we 
construct  a  co-association  matrix  [89],  C,  for  the  elements  of  kL,  and  keep  those  with  high  association 
values.  The  co-association  matrix  forms  a  voting  mechanism  that  combines  the  unmixing  results, 
leading  to  a  new  measure  of  similarity  between  points.  The  h  points  in  kL  are  mapped  into  an  h  x  h 
co-association  matrix  C  using 

T  Mi 

=  PjikPlik  (4.3) 

i—1  k—1 

In  other  words,  if  pixels  j  and  l  belong  to  the  same  “group”  of  endmembers,  then  they  have  high 
proportion  values  in  some  common  endmembers  of  that  group.  Hence,  their  co-association  value 
will  be  high.  On  the  other  hand,  if  pixels  j  and  l  do  not  share  any  endmember  within  the  group, 
then  their  co-association  value  will  be  low. 

Using  an  ac-cut  over  C,  we  keep  only  points  having  high  co-association  values.  As  a  result, 
we  obtain  a  smaller  subset,  kLa ,  of  voters  to  mainly  consistent  endmembers: 

kia  =  ( C)ac  =  {j,  l\jjkl  and  C(j,  l )  >  ac}.  (4.4) 
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Algorithm  4.1  Robust  Unmixing  Using  Consensus  Analysis 
Input:  {P,;  =  {pjik,  j  =  1..N;  k  =  Q,  ap,  ac 

Output:  Optimal  endmembers  in  the  data 

(jPik^Qip  —  {j  I  Pjik  ^  1 )  —  1,...,  Mi 

Hi  =  U  (p ik)ap 

i,k 

T  Mi 

c(j,i)  =  E  E  PjikPuk ,  Vj,  z  g  n 

i — X  k — 1 

HLa  =  ( C)ac  =  {j,  i|  j  ^  l  and  C(j,  l )  >  ac} 

Cati,  l)  =C(j,l),  Vj,l  G 

Convert  Ca  to  a  dissimilarity  D  =  exp  ma^g  y 
diag(T>)  =  0 

Use  a  clustering  algorithm  to  identify  clusters  within  D 

Select  one  endmember  from  each  cluster  (as  the  cluster  representative) 


The  subset  T~La  can  be  used  as  the  set  of  constraints  for  the  Proportion  Constrained  Multi-Model 
Unmixing  (PC-MMU)  algorithm  proposed  in  Section  3.4.2. 

The  last  step  of  our  approach  consists  of  identifying  clusters  within  "Ha-  Each  cluster  rep¬ 
resents  a  set  of  similar  endmembers  (estimated  by  the  different  algorithms/runs).  We  also  estimate 
the  optimal  number  of  clusters.  Since  HLa  is  expected  to  include  a  set  of  well-separated  clusters, 
various  clustering  algorithms  could  be  used  for  this  task.  In  our  work,  we  report  results  using  a 
simple  average  link  hierarchical  clustering  [90].  Since  Ca,  sub-matrix  of  C  corresponding  to  T-La,  is  a 
similarity  matrix,  we  first  convert  it  to  a  dissimilarity  matrix  D  using 


D  =  exp 


-Ca 


max  (Ca) 


(4.5) 


Furthermore,  the  diagonal  elements  of  D  are  set  to  zeros,  as  they  represent  the  dissimilarities  of 
identical  points.  Once  the  voters  are  clustered,  a  representative  from  each  cluster  is  chosen  to  be 
an  endmember.  Another  approach  would  be  to  run  an  unmixing  algorithm  on  this  reduced  subset 
of  voters,  using  the  number  of  identified  clusters  as  the  number  of  endmembers.  Algorithm  4.1 
summarizes  the  proposed  approach. 


45 


CHAPTER  5 


CONTEXT  DEPENDENT  HYPERSPECTRAL  SUBPIXEL  TARGET 

DETECTION 

In  this  chapter,  we  introduce  a  new  class  of  subpixel  target  detection  algorithms  that  use 
a  local  structured  background  model.  Our  approach,  referred  to  as  Context  Dependent  Target 
Detectors,  extends  existing  structured  detectors  to  multiple  contexts.  It  is  based  on  the  robust 
context  dependent  spectral  unmixing  algorithm,  presented  in  Chapter  3,  to  model  the  background 
variability.  The  claim  is  that  robust  context  dependent  unmixing  provides  a  better  description  of 
the  background  with  minimum  target  leakage,  compared  to  global  unmixing,  and  hence  results  in 
a  better  target-background  separation.  The  approach  is  evaluated  using  the  Adaptive  Matched 
Subspace  Detector  (AMSD)  [76],  the  Orthogonal  Subspace  Projection  (OSP)  detector  [76]  and  the 
Hybrid  Subspace  Detector  (HSD)  [79]. 

5.1  Motivations 

As  mentioned  in  Section  2.2,  target  detection  algorithms  face  the  challenge  of  target  leakage 
into  the  background,  which  is  the  contribution  of  the  target  pixels  to  the  background  model.  This 
happens  due  to  the  presence  of  targets  in  the  scene.  An  illustration  of  this  is  shown  in  figure  5.1. 
The  blue  dots  represent  the  data  and  form  one  convex  set.  The  cyan  dots  represent  noise  points. 
The  red  cross  represents  a  pixel  containing  the  target.  A  non  robust  unmixing  algorithm  will  identify 
the  endmember  set  displayed  by  the  red  dots.  In  this  case,  the  target  will  have  a  good  fit  in  the 
background  model  and  may  not  be  detected.  A  robust  unmixing  algorithm,  on  the  other  hand,  will 
ignore  noise  and  target  points  and  can  identify  the  endmember  set  displayed  by  green  dots.  In  this 
case,  the  target  does  not  fit  the  background  model  and  can  be  easily  detected. 

The  proposed  robust  context  dependent  spectral  unmixing  (RCDSU)  algorithm,  presented 
in  Section  3.5,  can  be  used  to  solve  this  problem.  In  fact,  targets  can  be  thought  of  as  noise  points 
or  outliers  not  belonging  to  the  background.  They  will  be  assigned  low  typicalities,  meaning  that 
they  will  not  contribute  to  the  estimated  endmembers.  In  other  words,  there  will  be  no  leakage  from 
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Figure  5.1:  Target  (red  cross)  leakage  with  non  robust  unmixing  (red  dots).  No  target  leakage  with 
robust  unmixing  (green  dots). 

the  target  subspace  to  the  background  subspace,  which  is  important  in  target  detection  [76]. 

Another  challenge  with  existing  target  detection  algorithms,  that  are  based  on  the  structured 
background  model,  is  that  they  usually  use  global  methods  to  describe  the  background.  Some  of  them 
use  eigenvector  decomposition  of  the  data  correlation  matrix,  while  others  use  unmixing  algorithms 
that  find  a  single  set  of  endmembers.  These  global  methods  may  not  provide  a  good  description  of 
the  hyperspectral  data,  especially  when  the  scene  includes  multiple  regions  with  distinct  materials. 
Hence,  unmixing  methods  that  can  find  multiple  endmember  sets  are  more  appropriate. 

This  problem  is  illustrated  in  figure  5.2  using  a  2  dimensional  synthetic  data.  The  blue  points 
represent  the  data  that  form  two  convex  sets.  The  red  cross  represents  a  pixel  containing  the  target. 
The  red  dots  represent  the  endmembers  resulting  from  a  typical  single  model  unmixing  algorithm. 
If  these  endmembers  were  used  to  describe  the  background,  the  target  pixel  would  be  a  normal 
background  pixel,  and  hence  might  not  be  detected.  On  the  other  hand,  if  a  multi-model  unmixing 
algorithm  was  used,  a  typical  result  would  be  the  2  sets  of  endmembers  displayed  as  green  dots.  In 
this  case,  the  target  pixel  will  not  fit  the  background  model  as  it  is  located  outside  both  convex 
hulls. 

To  summarize,  our  motivations  are  two-fold.  First,  there  is  the  need  for  a  better  background 
description  using  multiple  sets  of  endmembers.  Second,  there  is  the  need  to  limit  target  leakage  into 
the  background  subspace.  These  can  be  accounted  for  using  the  proposed  robust  context  dependent 
spectral  unmixing  (RCDSU). 

In  the  following,  we  assume  that  the  multiple  contexts  have  been  identified  using  RCDSU. 
Given  a  hyperspectral  data,  we  will  have  the  background  endmember  sets  E,:,  i  =  1..C;  the  propor- 
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Figure  5.2:  Multiple  endmember  sets  (green  dots)  versus  a  one  global  set  (ref  dots)  for  target  (red 
cross)  detection. 

tions  sets  P*  =  { p,;  ? .  j  =  1..7V},  i  =  1..C;  the  fuzzy  memberships  U  =  {tty,  i  =  1..C,  j  =  1..-/V}, 
and  the  cluster  centers  c i:  i  =  1..C. 

5.2  Context  Dependent  Target  Detectors 

In  this  section,  we  extend  the  AMSD  [76],  OSP  [76]  and  HSD  [79]  to  multiple  contexts  using 
RCDSU.  A  test  pixel,  x7,  will  be  assigned  a  local  detection  statistic  within  each  context  i ,  and  these 
local  statistics  will  be  combined  using  the  pixel’s  fuzzy  memberships  resulting  from  RCDSU,  ii,;7 ,  to 
produce  one  confidence  value.  In  the  following,  st  denotes  the  spectral  signature  of  the  target  that 
we  are  trying  to  detect. 

5.2.1  Context  Dependent  AMSD 

The  Context  Dependent  AMSD  (CD- AMSD)  extends  the  AMSD  to  multiple  local  endmem¬ 
ber  sets.  Given  a  test  pixel  Xj  in  the  data,  the  detection  confidence  value  is  computed  as 


(5.1) 


where  P£.  =  I  -  Ef  (E.Ef)-^  is  the  orthogonal  projection  error  matrix  on  the  local  background 
subspace  E,:  resulting  from  RCDSU,  and  Pg,:  =  I  —  Sf  (SiSf)~1Si  is  the  orthogonal  projection  error 
matrix  on  the  local  composite  target  and  background  subspace  S^  =  [st;  E,]. 

5.2.2  Context  Dependent  OSP  Detector 

The  Context  Dependent  OSP  (CD-OSP)  detector  provides  a  local  approach  to  the  traditional 
OSP  detector.  The  detection  statistics  are  computed  within  each  context,  and  combined  using  the 
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fuzzy  memberships  of  the  test  pixel  in  all  sets,  i.e. , 


Tcd-OSp(xj)  =  '52uijStPbix7j 


where  Pjh  is  the  same  as  in  (5.1). 


5.2.3  Context  Dependent  HSD 

The  Context  Dependent  HSD  (CD-HSD)  extends  the  HSD  to  multiple  local  background 


subspaces  using 


rr  l„  -I  \  '  (Xj  ,  (Xj  Pb!jEj)T 

Tcd-hs^x,)  =  ^  »«  (Xj_p„jSJrrI(xj_p,liSir 


In  (5.3),  Ei  is  the  ith  endmember  set  resulting  from  RCDSU,  and  S*  =  [sf;Ej]  is  the  ith  local 
composite  target  and  background  subspace,  pbij  and  pSij  are  the  proportions  of  the  jth  test  pixel 
in  the  local  background  subspace  E;,  and  local  composite  target  and  background  subspace  Sj, 
respectively.  Pbij  results  from  RCDSU  and  psly  is  computed  by  plugging  the  composite  target  and 
background  subspace  S;,  instead  of  E,,  in  the  update  equation  (3.10)  of  the  proportions  of  CDSU. 
The  covariance  matrix  of  context  i,  Tj,  is  estimated  using 

N 

E  (auij  +  -  C i)T(Xj  -  c i) 

ri  =  — - * - ,  (5-4) 

E  (o««  +  Ki) 


which  is  already  determined  by  RCDSU  to  compute  the  norm  matrices  A,. 
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CHAPTER  6 


EXPERIMENTAL  RESULTS 

This  chapter  presents  the  experimental  results  of  the  proposed  algorithms  on  synthetic  and 
real  hyperspectral  data.  The  experiments  were  ran  using  MATLAB  R2011a  on  a  computer  equipped 
with  a  3.6  GHz  Intel  Xeon  processor  and  a  24  GB  RAM.  First,  we  present  the  data  sets  to  be  used 
in  the  experiments,  then  we  present  the  results  of  the  different  algorithms  on  these  data  sets. 

6.1  Data  sets 

Two  kinds  of  data  sets  are  used  in  the  experiments:  simulated  and  real  data.  Simulated 
data  are  used  to  prove  the  concepts  of  the  proposed  algorithms.  Real  data,  on  the  other  hand,  are 
used  to  test  the  proposed  methods  in  real  case  scenarios. 

6.1.1  Simulated  data 

Two  categories  of  simulated  data  are  used:  2-dimensional  and  synthetic  hyperspectral  data. 

6. 1.1.1  Two-dimensional  data 

Three  data  sets  were  used  as  motivational  examples  in  Section  3.1.  These  are: 

•  D2C1M3  in  Section  3.1.1:  it  forms  one  convex  set  with  3  endmembers, 

•  D2C2M3  in  Section  3.1.2:  it  forms  two  convex  sets  with  3  endmembers  each,  and 

•  D2C3M3  in  Section  3.1.3:  it  forms  three  convex  sets  with  3  endmembers  each. 

Another  2-dimensional  data  set  is  generated  to  prove  the  concept  of  the  CDSUm  algorithm: 

•  D2EC2M3:  it  includes  2  elongated  convex  sets  generated  using  the  linear  model  of  equation 
(1.1)  with  3  endmembers  for  each  set.  The  data  is  shown  in  figure  6.1.  Each  cluster  contains 
2000  points. 


50 


4.5 


1  0.5  1  1.5  2  2.5  3  3.5  4 

Figure  6.1:  The  D2EC2M3  synthetic  data  with  two  elongated  convex  hulls. 
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Figure  6.2:  USGS  spectra  used  to  generate  the  synthetic  hyperspectral  data 

6. 1.1.2  Synthetic  hyperspectral  data 

These  are  generated  using  the  United  States  Geological  Survey  (USGS)  digital  spectral 
library  [88] .  The  library  contains  spectra  of  423  minerals,  17  plants  and  some  miscellaneous  materials. 
The  spectra  have  224  spectral  bands,  spanning  the  0.383  -  2.508  pm  wavelength  range.  The  minerals 
Spessartine,  Halloysite ,  Chlorite,  Rectorite,  Lizardite,  Kaolinite  and  Teepleite,  shown  in  figure  6.2, 
are  used  to  generate  the  data.  A  total  of  seven  data  sets  are  generated: 

•  UsgslC2M3:  it  has  two  convex  sets.  Each  convex  set  is  generated  using  the  linear  model 
in  (1.1),  using  three  different  endmembers.  Spectra  of  the  minerals  Spessartine ,  Halloysite, 
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Chlorite ,  Rectorite,  Lizardite  and  Kaolinite  are  used  to  generate  1000  spectral  signatures. 
The  first  three  endmembers  are  used  to  generate  the  first  convex  region,  and  the  last  three 
endmembers  are  used  to  generate  the  second  one,  each  having  500  points.  The  proportions 
for  each  data  point  are  generated  by  sampling  from  a  standard  uniform  distribution  and  are 
normalized  to  sum  to  one.  Zero-mean  Gaussian  noise  is  added  to  the  simulated  spectra  at  three 
noise  levels  resulting  in  3  data  sets.  The  noise  levels  are  adjusted  by  changing  the  variance  of 
the  Gaussian  to  obtain  Signal  to  Noise  Ratios  (SNR)  of  20  dB,  30  dB  and  50  dB  for  the  three 
levels.  The  SNR  is  defined  using  the  logarithmic  decibel  scale  as: 

SNR  =10log10(^-),  (6.1) 

\  *  noise  / 

where  Pdata  is  the  average  power  of  the  data,  and  Pnoise  is  the  average  power  of  the  noise. 

•  Usgs2C2M3:  it  has  two  convex  sets.  Each  convex  set  is  generated  using  the  linear  model 
in  (1.1),  using  three  different  endmembers.  The  six  selected  endmembers  correspond  to  the 
minerals  Spessartine ,  Halloysite ,  Chlorite,  Rectorite,  Kaolinite  and  Teepleite.  The  first  three 
endmembers  are  used  to  generate  the  first  convex  region,  and  the  last  three  endmembers  are 
used  to  generate  the  second  one,  each  having  1000  points.  The  proportions  were  randomly 
generated  from  a  standard  uniform  distribution  and  were  normalized  to  sum  to  one.  We 
randomly  select  few  points  from  each  set  and  add  a  zero-mean  Gaussian  noise  to  them,  at  a 
signal  to  noise  ratio  (SNR)  of  1  dB.  We  experiment  with  noise  levels  of  0%,  5%  and  10%. 

•  Usgs3ClM4:  it  has  one  convex  set  of  2000  points  generated  according  to  equation  (1.1), 
using  4  endmembers  ( Spessartine ,  Halloysite,  Chlorite  and  Lizardite).  The  proportions  are 
linearly  generated  between  0  and  0.9  for  each  endmember.  A  zero-mean  Gaussian  noise  at 
a  signal-to-noise  ratio  (SNR)  of  20  dB  was  added  to  the  data.  This  data  has  been  used  in 
Section  4.1  as  a  motivational  example  for  the  robust  unmixing  using  consensus  analysis. 

6.1.2  Real  data 

Three  real  data  sets  were  used  in  the  experiments:  the  Pavia  University  data,  the  University 
of  Southern  Mississippi  data  and  the  Indian  Pines  data: 

•  Pavia  University  data:  the  data  was  collected  on  July  8,  2002  over  an  urban  area  around  the 
Pavia  University  in  northern  Italy,  using  the  Reflective  Optics  System  Imaging  Spectrometer1 

1Data  available  at  http:/ /www. ehu.es/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes 
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(a)  RGB  image  (b)  Ground  truth  image 

Figure  6.3:  Pavia  University  data 

(ROSIS).  The  ROSIS  sensor  collects  data  over  the  430  -  860  nm  wavelength  range  at  a  4  nm 
spectral  sampling  interval.  The  image  originally  contains  610  x  340  pixels  having  103  spectral 
bands  each  (430  -  838  nm),  with  a  spatial  resolution  of  1.3  meters.  The  scene  consists  of  both 
natural  and  urban  regions  as  shown  in  figure  6.3(a).  Ground  truth  labels  are  provided  for  some 
areas  of  the  scene.  Nine  classes  are  defined:  asphalt,  meadows,  gravel,  trees,  painted  metal 
sheets,  bare  soil,  bitumen,  self-blocking  bricks  and  shadows.  Figure  6.3(b)  shows  the  labeled 
pixels  of  this  image. 

•  University  of  Southern  Mississippi  data:  Two  data  sets  are  available  for  this  site  [91]. 
The  first  one  is  a  LIDAR  image  acquired  using  an  Optech  Inc.  Gemini  Airborne  Topographic 
LIDAR  Mapper  (ALTM)  system.  The  second  one  is  a  hyperspectral  image  acquired  using 
an  ITRES  Inc.  hyperspectral  Compact  Airborne  Spectrographic  Imager  (CASI-1500),  which 
measured  reflectance  in  72  spectral  bands  across  the  Visible  and  Near-Infrared  (VISNIR) 
wavelengths  (375  -  1050  nm)  at  a  10  nm  resolution.  The  first  4  bands  were  removed  due  to 
the  presence  of  negative  values,  leaving  68  bands.  Both  data  sets  were  geometrically  corrected 
to  be  co-registered  with  lm  x  lm  pixels.  The  images,  originally  of  size  325  x  337  pixels,  were 
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Figure  6.4:  RGB  image  of  the  University  of  Southern  Mississippi  data 

acquired  over  the  campus  of  the  University  of  Southern  Mississippi  in  Gulfport,  Mississippi,  in 
November  2010.  The  scene  consists  of  both  natural  and  urban  regions  as  shown  in  figure  6.4. 

•  Indian  Pines  data:  the  data1  consists  of  a  145  x  145  pixels  image,  having  224  spectral 
bands  covering  the  0.4  -  2.5  jmi  wavelength  range.  It  was  collected  by  the  AVIRIS  sensor  over 
the  Indian  Pines  test  site  in  North-western  Indiana.  The  scene,  shown  in  figure  6.5,  contains 
two-thirds  agriculture,  and  one-third  forest  or  other  natural  perennial  vegetation.  There  are 
two  major  dual  lane  highways,  a  rail  line,  as  well  as  some  low  density  housing,  other  built 
structures,  and  smaller  roads.  The  number  of  bands  was  reduced  to  200  by  removing  bands 
covering  the  region  of  water  absorption  (bands  104  -  108,  150  -  163,  and  220). 

6.2  Context  Dependent  Spectral  Unmixing 

In  this  section,  we  present  the  results  of  the  proposed  CDSU  algorithm  on  synthetic  and  real 
data  sets.  We  also  provide  a  comparison  to  the  results  of  the  P-COMMEND  algorithm.  For  consis¬ 
tency,  both  algorithms  were  initialized  similarly.  The  parameters  were  experimentally  determined 
for  both  algorithms. 

1Data  available  at  http://www.ehu.es/ccwintco/index.plip/Hyperspectral_Remote_Sensing_Scenes 
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Figure  6.5:  RGB  image  of  the  Indian  Pines  data 
6.2.1  Evaluation  using  simulated  data 

First,  we  present  the  results  of  our  proposed  approach  on  the  three  toy  data  sets  (D2C1M3, 
D2C2M3  and  D2C3M3)  described  in  Section  3.1.  We  then  apply  our  method  to  the  UsgslC2M3 
simulated  hyperspectral  data  sets  described  in  Section  6.1. 1.2.  Due  to  the  simplicity  of  the  toy  data 
sets,  the  performance  of  the  algorithms  is  evaluated  visually.  However,  for  the  simulated  data  set, 
the  performance  is  evaluated  quantitatively. 

In  the  first  example,  we  use  the  D2C1M3  data  set  of  Section  3.1.1  (displayed  in  figure 
3.1(a)).  The  result  of  the  CDSU  algorithm  on  this  data  set,  using  C  =  1,  M  =  3,  to  =  2,  a  =  1,  and 
/?  =  0.01  is  shown  in  figure  6.6,  where  the  detected  endmembers  are  shown  in  red  X’s.  As  it  can  be 
seen,  the  CDSU  algorithm  succeeded  in  identifying  endmembers  at  the  vertices  of  the  convex  hull 
enclosing  the  data  points  providing  a  tight  fit  around  them. 

As  a  second  example,  we  use  the  D2C2M3  data  set  of  Section  3.1.2  (displayed  in  figure 
3.2(a)).  The  result  of  the  CDSU  algorithm  on  this  data  set,  using  C  =  2,  M  =  3,  m  =  2,  a  =  20 
( a  has  been  increased,  compared  to  D2C1M3,  in  order  to  account  for  the  greater  number  of  data 
points  in  D2C2M3),  and  (3  =  [0.01,0.01]  is  shown  in  figure  6.7.  The  detected  endmembers  are 
shown  in  red  X’s  for  one  cluster  and  in  green  X’s  for  the  other  cluster.  As  it  can  be  seen,  the  CDSU 
algorithm  succeeded  in  identifying  endmember  sets  at  the  vertices  of  the  convex  hulls  enclosing  the 
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Figure  6.6:  Result  of  the  CDSU  algorithm  on  the  D2C1M3  data. 
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Figure  6.7:  Result  of  the  CDSU  algorithm  on  the  D2C2M3  data. 

data  points  and  thus,  providing  a  tight  fit  around  them. 

For  a  third  example,  we  use  the  D2C3M3  data  set  of  Section  3.1.3  (displayed  in  figure 
3.3(a)).  The  result  of  the  CDSU  algorithm  on  this  data  set,  using  C  =  3,  M  =  3,  m  =  2,  a  =  50 
( a  has  been  increased,  compared  to  D2C2M3,  in  order  to  account  for  the  greater  number  of  data 
points  in  D2C3M3),  and  (3  =  [0.1,  0.1, 0.1]  (/3  has  been  increased,  compared  to  D2C1M3  and 
D2C2M3,  in  order  to  account  for  the  tighter  nature  of  the  data  in  D2C3M3)  is  shown  in  figure 
6.8.  The  identified  clusters  are  represented  using  different  colors,  and  the  endmembers  for  each 
cluster  are  represented  using  X’s.  As  it  can  be  seen,  the  CDSU  algorithm  succeeded  in  identifying 
endmember  sets  at  the  vertices  of  the  convex  hulls  enclosing  the  data  points,  providing  a  tight  fit 
around  them.  It  also  provided  clusters  that  match  the  distribution  of  the  data. 

For  a  forth  example,  we  consider  the  UsgslC2M3  simulated  hyperspectral  data  set  de¬ 
scribed  in  Section  6. 1.1. 2.  To  evaluate  the  performance  of  the  CDSU  algorithm  and  compare  it 
to  P-COMMEND,  the  estimated  abundance  fractions  and  endmembers  are  compared  to  the  true 
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Figure  6.8:  Result  of  the  CDSU  algorithm  on  the  D2C3M3  data. 


ones.  Based  on  the  mean  square  error  (MSE),  we  define  the  spectral  mean  error  ( SME )  and  the 
abundance  mean  error  ( AM E )  as 


SME=^~  ||E-E|||, 


(6.2) 


and 


(6.3) 


In  (6.2)  and  (6.3),  M  is  the  total  number  of  endmembers,  d  is  the  number  of  spectral  bands,  and  N 
is  the  number  of  data  points.  The  rows  of  E  and  E  represent  the  true  endmembers  and  the  learned 


endmembers  respectively.  P  is  a  IV  x  M  matrix  representing  the  true  endmember  abundance  frac¬ 


tions.  P  is  a  N  x  M  matrix  representing  the  estimated  endmember  abundance  fractions,  where  each 


proportion  value  is  multiplied  by  the  corresponding  cluster  membership.  The  notation  ||.||i?  stands 
for  the  Frobenius  norm. 

Another  common  performance  metric  is  the  spectral  angle  distance,  which  measures  the  angle  be¬ 
tween  a  signature  e.;  and  its  estimate  e*  [1],  Based  on  this  metric,  we  define  a  spectral  mean  angle 
error  ( SMAE )  as 


(6.4) 


It  is  clear  that  the  performance  of  the  algorithms  increases  as  SME ,  AME,  and  SMAE  approach 
zero.  Notice,  however,  that  the  estimates  of  E  and  P  are  up  to  a  permutation  matrix.  Thus,  a  simple 


algorithm  based  on  the  Hungarian  method  [92]  has  been  designed  and  used  to  infer  the  permutation 


matrix. 

We  run  the  CDSU  and  P-COMMEND  algorithms  using  the  parameters  in  table  6.1.  We  run  each 
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TABLE  6.1 


Parameters  used  for  the  CDSU  and  P-COMMEND  algorithms  on  the  USGS  simulated  hyperspectral 
data 


Parameters 

CDSU 

P-COMMEND 

G 

2 

2 

M 

3 

3 

m 

1.25 

1.25 

a,  f3 

a  =  200,  (3  =  [0.1, 0.1] 

a  =  0.1 

Stopping  criterion 

nU~  (  JcDSu(Iter+l)-JcDSu(Iter)\  ^  in-6 

(  JpcOMMEND(Iter+l)  —  JpcoMMEND(Iter)\  ^  i  r>  — 6 

{Iter  =  Iteration  number) 

Q0S  (  JCnsu(Iter+ 1)  J  <  J-U 

\  JpcOMMEND(Iter+l)  J 

algorithm  25  times  at  each  noise  level.  For  each  run,  we  use  the  FCM  [72]  and  MVSA  [70]  al¬ 
gorithms  for  the  initialization  of  the  memberships,  centers  and  endmembers.  This  experiment  is 
designed  to  test  the  sensitivity  of  CDSU  to  noise  and  initialization,  and  to  provide  a  comparison  to 
P-COMMEND. 

Figures  6.9(a),  (b)  and  (c)  show  a  box  plot  of  the  different  error  metrics  of  both  CDSU  and  P- 
COMMEND  algorithms  across  the  25  runs  and  at  all  noise  levels.  On  each  box,  the  central  red 
mark  represents  the  median  value,  the  edges  of  the  box  are  the  25th  and  75th  percentiles,  the 
whiskers  extend  to  the  most  extreme  values  not  considered  outliers,  and  outliers  are  plotted  indi¬ 
vidually  using  red  crosses.  An  outlier  is  a  value  smaller  than  the  first  quartile  minus  1.5  times  the 
interquartile  range  (third  minus  first  quartile),  or  higher  than  the  third  quartile  plus  1.5  times  the 
interquartile  range. 

Examining  these  results,  we  can  see  that,  for  low  noise  level  cases  (SNR  =  30  and  50  dB),  CDSU  and 
P-COMMEND  perform  similarly.  Both  algorithms  are  also  robust  to  initialization.  As  we  increase 
the  noise  level  (SNR  =  20  dB),  CDSU  starts  to  outperform  P-COMMEND  (p-value=1.4e-09  using 
the  Wilcoxon  rank  sum  test  [93]).  An  explanation  for  this  would  be  that  the  clustering  term  of  the 
CDSU  objective  function  makes  it  more  robust  to  noise  than  P-COMMEND,  since  noise  does  not 
affect  clustering  as  much  as  it  would  affect  spectral  unmixing.  Furthermore,  we  see  that  the  error 
metric  increases  for  both  algorithms  as  the  noise  level  increases,  which  is  expected. 

To  compare  and  illustrate  the  results  further,  we  visualize  the  estimated  endmembers  by  both  algo¬ 
rithms  for  the  case  of  the  highest  noise  level  (SNR  =  20  dB).  We  pick  the  run  in  which  P-COMMEND 
gave  the  highest  error.  Figure  6.10(a)  and  6.10(b)  show  the  true  (solid  lines)  and  estimated  (dashed 
lines)  endmembers  resulting  from  the  CDSU  and  P-COMMEND  algorithms  respectively.  As  it  can 
be  seen,  CDSU  resulted  in  better  estimates  of  the  endmembers. 

To  analyze  the  results  further,  we  check  the  cluster  assignments,  based  on  the  highest  membership 
values,  generated  by  P-COMMEND  and  CDSU.  As  it  can  be  seen  in  table  6.2,  both  algorithms 
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Figure  6.9:  Error  metrics  for  CDSU  and  P-COMMEND  on  the  UsgslC2M3  data  across  the  25 
runs  and  at  all  noise  levels:  (a)  SME ,  (b)  SMAE ,  (c)  AME. 


TABLE  6.2 

Composition  of  the  clusters  generated  by  CDSU  and  P-COMMEND  on  the  UsgslC2M3  data  with 
SNR  =  20  dB 


Convex  set  1 

Convex  set  2 

Cluster  1 

500 

0 

Cluster  2 

0 

500 

succeeded  in  generating  clusters  that  are  pure  and  that  correspond  to  the  two  original  convex  re¬ 
gions.  To  analyze  the  fuzzy  partition,  we  analyze  the  membership  values  of  the  data  points  in  each 
cluster.  We  plot  the  two  first  principal  components  of  the  data  and  we  color  each  point  with  a  shade 
corresponding  to  its  membership  value  in  a  specific  cluster.  Figures  6.11(a)  and  (b)  show  the  mem¬ 
bership  values  in  cluster  1  generated  by  P-COMMEND  and  CDSU.  Similarly,  figures  6.12(a)  and 
(b)  show  the  membership  values  in  cluster  2.  Compared  to  CDSU,  P-COMMEND  assigned  higher 
membership  values  in  cluster  1  to  some  points  from  cluster  2  (light  blue  shades  in  figure  6.11(b)). 
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(a)  CDSU 


(b)  P-COMMEND 

Figure  6.10:  True  (solid  lines)  and  estimated  (dashed  lines)  endmembers  for  the  UsgslC2M3  data 
with  SNR  =  20  dB  using  CDSU  and  P-COMMEND. 
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(a)  CDSU  (b)  P-COMMEND 

Figure  6.11:  Membership  values  in  cluster  1  for  the  UsgslC2M3  data  with  SNR  =  20  dB.  Two 
principal  components  of  the  data  are  scatter  plotted. 


(a)  CDSU  (b)  P-COMMEND 

Figure  6.12:  Membership  values  in  cluster  2  for  the  UsgslC2M3  data  with  SNR  =  20  dB.  Two 
principal  components  of  the  data  are  scatter  plotted. 


And  due  to  the  sum  to  one  constraint  on  the  memberships,  the  same  points  were  assigned  lower 
membership  values,  compared  to  CDSU,  in  cluster  2  that  they  belong  to  (light  green  shades  in  figure 
6.12(b)).  Since  these  membership  values  are  used  in  the  update  equations  of  the  endnrembers  and 
abundances,  the  parameters  estimated  using  CDSU  are  more  accurate  than  those  estimated  using 
P-COMMEND. 

Recall  that  the  update  equation  of  the  memberships  for  P-COMMEND  is: 

[(xi  -  PiiEi)(xJ-  -  PijE;)T]  1=55 

Uij  =  — - — ,  (6.5) 

S  [(Xt  ~  PgjEgXXj  —  PqjEq)T } 1_m 

9=1 
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Figure  6.13:  Running  time  (in  seconds)  of  CDSU  and  P-COMMEND  on  the  USGS  simulated  data 
across  the  25  runs  and  at  all  noise  levels. 

and  the  update  equation  of  the  memberships  for  CDSU  is: 

[(Xj  -  C i)(xj  -  Ci)T  +  a(xj  -  pijE i)(xj  -  PijEj)T]  ^ 

Uij  =  — - — .  (6.6) 

E  [(xj  -  cg)(xj  ^  C?)T  +  «(xj  “  PgjEg)(xj  ~  Pgj^gV]^ 
g— i 

By  examining  (6.6),  we  notice  that  a  pixel  j  will  have  a  high  membership  in  cluster  i  if:  (i)  its 
spectrum  is  close  to  the  centroid,  c,,  of  that  cluster  in  the  feature  space;  and  (ii)  it  fits  the  model 
of  that  cluster.  This  is  unlike  the  update  equation  of  the  memberships  for  P-COMMEND  in  (6.5), 
where  a  pixel  j  will  have  a  high  membership  in  model  i  only  if  it  fits  that  model,  regardless  of 
the  spectral  distribution  around  it.  This  illustrates  the  importance  of  the  clustering  term  in  taking 
the  distribution  of  the  data  into  account  and  thus  leading  to  better  endmember  and  abundance 
estimates. 

To  compare  the  time  complexity  of  CDSU  and  P-COMMEND,  we  generate  a  box  plot  of 
the  running  time,  expressed  in  seconds,  across  the  25  runs  and  at  all  noise  levels.  Figure  6.13 
shows  this  box  plot.  We  notice  that  the  CDSU  algorithm  takes  a  longer  time  to  run  compared  to 
P-COMMEND  for  the  cases  where  SNR  =  30  or  50  dB.  The  reason  for  this  is  that  CDSU  requires 
more  computations  than  P-COMMEND.  For  the  SNR  =  20  dB  case,  we  see  that  CDSU  becomes 
slightly  faster  than  P-COMMEND.  This  can  be  explained  by  the  clustering  term  in  CDSU  which 
makes  it  more  robust  to  noise  and  thus  requiring  less  time  to  converge. 

We  use  the  UsgslC2M3  simulated  data  set,  with  SNR  =  20  dB,  to  illustrate  another 
advantage  of  our  proposed  approach.  One  might  argue  why  not  cluster  the  data  first,  then  use  a 
single  convex  region  unmixing  algorithm,  such  as  ICE,  to  find  the  endmembers  and  abundances 
of  each  cluster.  In  other  words,  what  is  the  advantage  of  performing  unmixing  and  clustering 


e  P 


-  CDSU  P-COMMEND  CDSU  P-COMMEND  CDSU  P-COMMENfl 
SNR=20dB  SNR=20dB  SNR=30dB  SNR=30dB  SNR=50dB  SNR=50dB 


62 


simultaneously. 

Figure  6.14  shows  the  clustering  results  of  applying  the  FCM  algorithm  to  the  simulated  data  with 
SNR  =  20  dB.  We  plot  the  spectral  signatures  of  the  two  resulting  clusters,  labeling  each  point 
of  them  with  the  ground  truth,  i.e.,  the  convex  set  it  originally  came  from.  We  clearly  see  that 
the  resulting  clusters  include  a  mixture  of  points  originally  generated  from  both  endmember  sets, 
and  hence,  any  unmixing  algorithm  applied  to  each  cluster  would  result  in  erroneous  endmember 
estimates.  We  should  note  here  that  the  erroneous  partition  is  not  due  to  sensitivity  of  FCM  to 
initialization.  In  fact,  even  when  we  initialize  the  FCM  with  the  true  cluster  assignments,  it  converges 
to  the  same  result  of  figure  6.14. 

The  partition  obtained  using  CDSU  is  shown  in  figure  6.15.  As  it  can  be  seen,  unlike  the  FCM,  CDSU 
partitioned  the  data  into  2  pure  clusters  even  though  it  was  initialized  with  the  FCM.  Consequently, 
using  this  correct  partition,  it  is  more  likely  to  estimate  the  correct  endmembers. 

In  order  to  understand  why  FCM  leads  to  mixed  clusters  whereas  CDSU  succeeds  to  return 
pure  ones,  we  track  the  objective  function  values  of  both  algorithms  as  they  run.  Figure  6.16  shows 
the  evolution  of  the  objective  function  of  the  FCM  until  its  convergence.  As  it  can  be  seen,  it  reaches 
a  local  minimum  in  few  iterations.  For  the  CDSU,  we  track  the  first  term  of  the  objective  function 
in  (3.7)  (same  as  the  FCM  objective  function),  and  the  sum  of  all  terms.  As  it  can  be  seen  in  figure 
6.17,  the  sum  of  the  terms  decreases  as  the  algorithm  evolves.  However,  the  first  term  increases. 
In  other  words,  the  optimal  partition  is  the  one  that  takes  into  account  both  data  partitioning  and 
model  fitting. 

Using  two  dimensional  toy  data  and  simulated  hyperspectral  data,  we  have  illustrated  the 
effectiveness  of  the  proposed  CDSU  algorithm  in  identifying  correct  abundances  and  endmember 
sets.  We  have  also  shown  that  CDSU  outperformed  P-COMMEND  in  the  case  of  noisy  data. 
Furthermore,  we  showed  the  difference  between  our  approach,  that  performs  spectral  unmixing  while 
simultaneously  taking  into  account  the  distribution  of  the  data  in  the  spectral  space,  and  simply 
clustering  the  data  and  then  unmixing  each  cluster  separately.  In  the  next  section,  we  present  the 
results  of  CDSU  on  a  real  hyperspectral  data  and  compare  them  to  those  of  P-COMMEND. 

6.2.2  Evaluation  using  real  data 

In  this  section,  we  consider  a  down  sampled  version,  of  size  305  x  170  pixels,  of  the  Pavia 
University  real  hyperspectral  data  set  described  in  Section  6.1.2.  We  run  the  CDSU  and  P- 
COMMEND  algorithms  on  this  data  set  using  the  parameters  in  table  6.3. 
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Figure  6.14:  Spectral  signatures  of  the  clusters  generated  by  the  FCM  algorithm  on  the  UsgslC2M3 
data  with  SNR  =  20  dB. 
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Figure  6.15:  Spectral  signatures  of  the  clusters  generated  by  the  CDSU  algorithm  on  the 
UsgslC2M3  data  with  SNR  =  20  dB. 
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Figure  6.16:  Evolution  of  the  FCM  objective  function  on  the  UsgslC2M3  data  (SNR  =  20  dB). 


Figure  6.17:  Evolution  of  the  objective  function  of  CDSU  on  the  UsgslC2M3  data  (SNR  =  20 
dB). 


TABLE  6.3 


Parameters  used  for  the  CDSU  and  P-COMMEND  algorithms  on  the  Pavia  University  data 


Parameters 

CDSU 

P-COMMEND 

c 

3 

3 

M 

3 

3 

m 

1.25 

1.25 

a,  (3 

a  =  1,  /3=  [5,5,5] 

a  =  5 

Stopping  criterion 
( Iter  =  Iteration  number) 

i  ( JcDSu(Iter+l)-JCDSu(Iter)\  ^  ir,_6 

1  JpcoMMEND(Iter+l)-JpcoMMEND(Iter)  \  i  n— 6 

Q0S  (  Jcnsu(Iter+ 1)  )  <  iU 

\  JpCOMMEND{Iter+ 1)  J 
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(a)  CDSU  (b)  P-COMMEND 

Figure  6.18:  Cluster  assignments  of  CDSU  and  P-COMMEND  on  the  Pavia  University  data 


TABLE  6.4 

Composition  of  the  clusters  generated  by  P-COMMEND  and  CDSU,  in  percentage,  for  the  Pavia 
University  data,  taking  into  account  the  labeled  points  only 


- Class 

Cluster 

Asphalt 

Meadows 

Gravel 

Trees 

Painted  metal  sheets 

Bare  soil 

Bitumen 

Self  blocking  bricks 

Shadows 

P-COMMEND 

Cluster  1 

16.01 

22.78 

3.19 

35.36 

0 

7.14 

0.19 

15.30 

0 

Cluster  2 

0.57 

76.74 

0.03 

0.51 

0 

17.74 

0.01 

0.18 

4.17 

Cluster  3 

40.89 

0 

14.58 

0.03 

10.66 

4.71 

10.38 

18.66 

0.06 

CDSU 

Cluster  1 

45.54 

21.46 

3.98 

0.24 

0.03 

10.29 

9.23 

2.32 

6.88 

Cluster  2 

0.05 

73.08 

0 

14.71 

0 

12.14 

0 

0 

0 

Cluster  3 

6.45 

6.17 

18.32 

0.04 

15.49 

13.36 

1.16 

38.97 

0 

Figures  6.18(a)  and  (b)  show  the  images  of  cluster  assignments  for  both  algorithms.  Cluster  assign¬ 
ments  were  based  on  the  highest  membership  values.  As  it  can  be  seen,  the  clusters  generated  by 
CDSU  are  spatially  more  coherent. 

Table  6.4  shows  the  compositions  of  the  clusters  generated  by  P-COMMEND  and  CDSU,  in  per¬ 
centage,  taking  into  account  the  labeled  points  only.  The  same  compositions  are  illustrated  visually 
in  figures  6.19(a)  and  (b).  As  it  can  be  seen,  the  “Trees”,  “Painted  Metal  Sheets”,  “Bitumen” 
and  “Shadows”  classes  were  assigned  to  unique  clusters  by  both  algorithms.  On  the  other  hand,  the 
“Meadows” ,  “Gravel”  and  “Bare  soil”  classes  were  divided  into  more  than  one  cluster  by  both  algo- 
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Figure  6.19:  Composition,  in  percentage,  of  the  clusters  generated  by  CDSU  and  P-COMMEND  on 
the  Pavia  University  data,  taking  into  account  the  labeled  points  only. 

rithms.  Finally,  the  “Asphalt”  and  “Self  blocking  bricks”  classes  were  divided  into  more  than  one 
cluster  by  P-COMMEND  and  less  divided  by  CDSU.  This  is  mainly  due  to  the  spectral  similarity 
that  CDSU  takes  into  account  when  partitioning  the  data,  unlike  P-COMMEND. 

The  proportion  maps  associated  with  the  three  endmembers  for  each  of  the  three  clusters 
found  by  P-COMMEND  and  CDSU  are  shown  in  figures  6.20  and  6.21,  respectively.  Each  pixel  in 
the  proportion  maps  was  multiplied  by  the  corresponding  membership  value  for  each  context  so  that 
the  corresponding  endmember  with  high  proportion  is  highlighted.  Since  we  do  not  have  labels  for 
the  entire  image,  we  analyze  these  proportion  maps  by  comparing  them  to  the  RGB  image  in  figure 
6.3(a),  labeling  each  map  with  the  dominant  material  it  represents. 

It  can  be  seen  that  the  endmembers  found  by  P-COMMEND  are  duplicated  in  more  than  one 
cluster.  For  instance,  endmembers  corresponding  to  shadows,  tall  grass  and  trees  are  found  in 
cluster  1  and  2.  Furthermore,  the  endmember  for  cement  is  found  in  clusters  1  and  3.  Moreover, 
some  endmembers  represent  non  coherent  elements,  like  combining  asphalt  and  bitumen  with  cement 
in  one  endmember.  In  contrast,  the  endmembers  found  by  CDSU  are  not  duplicated  in  the  three 
contexts  and  they  represent  coherent  elements. 


Cluster  1 


Cluster  2  Cluster  3 
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(a)  Tall  grass  +  Trees 


20  40  EO  80  100  120  140  100 


(d)  Shadows 


20  40  BO  80  100  120  140  100 
(b)  Shadows 


(e)  Tall  grass 


(c)  Cement 


(f)  Roof  bricks 


20  40  60  00  100  120  140  100 


(g)  Bright  objects  (h)  Asphalt  +  Bitumen  +  Cement  (i)  Roof  metal  sheets 

Figure  6.20:  Proportion  maps  estimated  by  the  P-COMMEND  algorithm  for  the  Pavia  Univer¬ 
sity  data.  Each  row  of  3  proportion  maps  represents  one  context,  and  each  column  represents  an 
endmember  in  that  context. 
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(c)  Uneven  bare  soil 


(f)  Short  grass 


20  40  60  80  100  120  140  160 


(g)  Bright  objects  (h)  Roof  metal  sheets  (i)  Cement  +  Roof  bricks 

Figure  6.21:  Proportion  maps  estimated  by  the  CDSU  algorithm  for  the  Pavia  University  data. 
Each  row  of  3  proportion  maps  represents  one  context,  and  each  column  represents  an  endmember 
in  that  context. 
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Figures  6.22(a)  and  (b)  show  the  estimated  endmembers  by  CDSU  and  P-COMMEND, 
respectively.  As  it  can  be  seen,  both  algorithms  identified  similar  endmembers  like  the  ones  corre¬ 
sponding  to  metal  roofs,  bright  objects,  shadows,  and  tall  grass  and  trees.  However,  P-COMMEND 
generated  an  almost  duplicate  endmember  for  shadows,  tall  grass  and  trees.  P-COMMEND  also 
missed  endmembers  for  bare  soil  that  CDSU  was  able  to  identify. 


(a)  CDSU 


(b)  P-COMMEND 


Figure  6.22:  Estimated  endmembers  for  the  Pavia  University  data  using  CDSU  and  P- 
COMMEND. 
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In  this  section,  we  presented  the  results  of  the  proposed  CDSU  algorithm  on  real  hyper- 
spectral  data  and  compared  them  to  those  of  P-COMMEND.  Both  methods  did  agree  on  some 
endmembers.  However,  while  P-COMMEND  resulted  in  some  duplicate  and  non  coherent  endmem- 
bers,  CDSU  identified  coherent  and  diverse  endmembers  representing  meaningful  elements  in  the 
scene. 

6.3  Context  Dependent  Spectral  Unmixing  Using  the  Mahalanobis  Distance 

The  Context  Dependent  Spectral  Unmixing  using  the  Mahalanobis  distance  (CDSUm)  is  a 
variation  of  CDSU  that  allows  more  flexibility  in  the  shape  of  the  clusters.  It  accounts  for  ellipsoidal 
shapes  beside  the  traditional  spherical  shape.  The  experiment  in  this  section  is  designed  to  illustrate 
the  advantage  of  CDSUm  over  CDSU  using  a  simple  toy  data  set. 

We  use  the  D2EC2M3  synthetic  data  set  described  in  Section  6. 1.1.1.  CDSU  and  CDSUm 
were  ran  using  the  same  parameters  values:  C  =  2,  M  =  3,  m  =  2,  a  =  40,  and  /3  =  [0.2, 0.2].  For 
CDSUm,  ct,  was  set  to  1  for  i  =  1, 2. 

Figure  6.23  illustrates  the  results  of  both  algorithms.  The  retrieved  clusters  are  represented 
in  different  colors,  and  the  endmembers  are  represented  in  red  and  green  X’s.  Due  to  the  “non- 
spherical”  nature  of  the  clusters,  CDSU  failed  to  identify  the  correct  cluster  assignments.  This 
made  it  fail  to  identify  appropriate  endmembers.  On  the  other  hand,  CDSUm  successfully  identified 
both  t 

4.5 

4 

3.5 

3 

2.5 

2 


(a)  CDSU  (b)  CDSUm 

Figure  6.23:  Results  of  CDSU  and  CDSUm  on  the  D2EC2M3  data. 
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(a)  RGB  image  (b)  Segmented  image 

Figure  6.24:  The  University  of  Southern  Mississippi  data. 


6.4  Cluster  Constrained  Multi-Model  Unmixing 

The  Cluster  Constrained  Multi-Model  Unmixing  algorithm  is  a  semi-supervised  variation  of 
CDSU  where  partial  supervision  information,  in  the  form  of  cluster  assignment  constraints,  is  used  to 
guide  the  search  process  and  avoid  local  minimum  solutions.  In  order  to  generate  such  constraints, 
we  needed  data  collected  by  two  sensors,  specifically  hyperspectral  and  LIDAR  sensors.  We  use 
a  down  sampled  version,  of  size  163  x  169  pixels,  of  the  University  of  Southern  Mississippi 
hyperspectral  and  LIDAR  data  described  in  Section  6.1.2. 

First,  the  LIDAR  image  is  segmented  using  the  Digital  Elevation  Maps  (DEM)  to  partition 
it  into  different  elevation  levels.  Then,  the  vegetation  regions  are  identified  using  the  normalized 
difference  vegetation  index  NDVI  on  the  co-located  hyperspectral  image.  Finally,  the  shadowed 
regions  are  identified  using  the  altitude  of  the  plane,  the  position  of  the  sun  at  the  time  of  the 
collection,  and  the  DEM  [94].  Figure  6.24  illustrates  the  RGB  image  of  the  area  and  the  resulting 
segmented  image.  According  to  [94],  the  identified  segments  correspond  to:  (0)  unlabeled,  (1) 
ground/impervious,  (2)  ground/pervious,  (3)  ground/shadow,  (4)  trees,  (5)  buildings,  (6)  beach  and 
(7)  calibration  tarps.  The  segmented  image  is  used  to  construct  a  set  of  should-link  constraints  by 
selecting  pairs  of  pixels  that  belong  to  the  same  segment.  The  image  has  a  total  of  27547  pixels,  and 
a  subset  of  pixels  were  selected  randomly  to  create  a  set  of  constraints  that  includes  6840  should-link 
pairs.  The  dimensionality  of  the  hyperspectral  data  was  reduced  from  68  to  30  using  the  Ward’s 
linkage  strategy  with  divergence  from  [95]. 

To  illustrate  the  advantage  of  using  partial  supervision,  we  compare  the  performance  of 
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CC-MMU  and  CDSU.  We  ran  these  algorithms  using  the  parameters  in  table  6.5.  Due  to  the  high 
dimensionality  of  the  feature  space,  we  set  the  off-diagonal  elements  of  the  norm  matrices  A;  to 
zero. 


TABLE  6.5 

Parameters  used  for  the  CC-MMU  and  CDSU  algorithms  on  the  University  of  Southern  Mis¬ 
sissippi  data 


Parameters 

C 

M 

m 

a 

P 

7 

5 

< 

<s>. 

Pjki  Yj,  k 

CC-MMU 

3 

3 

2 

4 

[10,  10,  10] 

0.3 

1 

1 

CDSU 

3 

3 

2 

4 

[10,  10,  10] 

N/A 

Figures  6.25  and  6.26  illustrate  the  proportion  maps  resulting  from  CC-MMU  and  CDSU 
respectively.  The  proportions  were  multiplied  by  the  corresponding  cluster  memberships  in  order  to 
highlight  pixels  from  that  cluster.  The  values  are  displayed  as  a  heat  map  where  small  values  are 
shown  in  dark  blue  and  large  values  in  dark  red.  Each  proportion  map  is  labeled  with  the  dominant 
material  it  represents.  In  absence  of  ground  truth,  endmember  labeling  was  done  by  comparing  the 
proportion  maps  to  the  RGB  image  in  figure  6.24(a).  It  can  be  seen  that  cluster  2  from  CC-MMU 
(2nd  row  in  figure  6.25)  and  cluster  1  from  CDSU  (1st  row  in  figure  6.26)  are  identical.  However,  the 
two  remaining  clusters  are  different.  CDSU  combined  bare  soil  and  cement  in  one  endmember  in 
cluster  3  (3rd  row  in  figure  6.26),  whereas  CC-MMU  succeeded  in  identifying  different  endmembers 
for  those  materials  (3rd  row  in  figure  6.25).  We  also  notice  that  CC-MMU  combined  man-made 
materials  (asphalt,  bitumen,  cement)  in  one  cluster  (3rd  row  in  figure  6.25),  as  opposed  to  CDSU 
which  divided  them  into  two  different  clusters  (2nd  and  3rd  rows  in  figure  6.26). 

We  also  provide  a  comparison  of  running  CDSU,  CDSUm  and  CC-MMU  using  the  Euclidean 
and  the  Mahalanobis  distances  by  reporting  the  number  of  satisfied  constraints  in  table  6.6.  We 

TABLE  6.6 

Number  of  satisfied  constraints  using  CC-MMU  and  CDSU  with  Euclidean  and  Mahalanobis  dis¬ 
tances  on  the  University  of  Southern  Mississippi  data 


Semi-supervised 

Unsupervised 

CC-MMU  (Euclidean) 

CC-MMU  (Mahalanobis) 

CDSU 

CDSUm 

4842  /  6840 

6591  /  6840 

4755  /  6840 

6263  /  6840 

notice  that  running  CC-MMU  using  the  Mahalanobis  distance  yields  more  satisfied  constraints. 
This  is  expected  since  the  Mahalanobis  distance  allows  for  more  degrees  of  freedom  by  seeking 
ellipsoidal  clusters  instead  of  spherical  ones  as  with  the  Euclidean  distance.  The  additional  degrees 
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Figure  6.25:  Proportion  maps  from  CC-MMU  on  the  University  of  Southern  Mississippi  data. 
Rows  correspond  to  clusters  and  columns  correspond  to  endmembers. 
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Figure  6.26:  Proportion  maps  from  CDSU  on  the  University  of  Southern  Mississippi  data. 
Rows  correspond  to  clusters  and  columns  correspond  to  endmembers. 
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of  freedom  enable  the  supervision  constraints  to  have  more  influence  on  the  final  partition.  It  is  worth 
mentioning  here  that  the  proportion  maps  for  CDSUm  and  CC-MMU  with  the  Mahalanobis  distance 
are  fairly  similar.  Moreover,  the  proportion  maps  for  CDSU  and  CC-MMU  using  the  Euclidean 
distance  are  also  fairly  similar.  This  indicates  that,  for  this  data,  the  Mahalanobis  distance  is  more 
influential  than  the  semi-supervision  information  for  the  clusters  formation.  Nevertheless,  CC-MMU 
can  still  be  helpful  for  other  data  sets  or  with  other  supervision  information  as  it  offers  the  option 
to  incorporate  domain  and  expert  knowledge  into  the  unmixing  process. 

6.5  Proportion  Constrained  Multi-Model  Unmixing 

The  Proportion  Constrained  Multi-Model  Unmixing  (PC-MMU)  algorithm  is  another  semi- 
supervised  version  of  CDSU  where  partial  supervision  information,  in  the  form  of  constraints  based 
on  proportions,  is  used  to  guide  the  search  process  and  avoid  local  minimum  solutions. 

We  use  the  same  data  of  the  previous  section  to  test  PC-MMU.  Two  cases  of  generating  the 
constraints  are  used.  In  the  first  case,  the  set  of  should-link  constraints  of  the  previous  section  is 
used  as  the  set  of  points  constrained  to  have  similar  proportions  in  the  extracted  endmembers.  In 
the  second  case,  we  use  the  consensus  analysis  presented  in  Section  4.2  to  form  the  set  of  constraints. 
We  randomly  select  5000  pairs  of  voters,  that  have  high  co-association  values,  from  the  set  "Ha.  The 
voters  are  points  having  a  high  proportion  value  in  any  extracted  endmember  of  any  of  the  unmixing 
algorithms.  The  co-association  value  of  a  pair  of  voters  is  obtained  over  the  entire  unmixing  ensemble, 
forming  a  new  similarity  measure  based  on  the  proportions.  A  pair  of  voters  that  have  a  high  co¬ 
association  value  is  likely  to  have  similar  proportions  in  the  extracted  endmembers. 

For  both  cases,  we  use  the  same  parameters  as  for  CC-MMU  in  Section  6.4  (table  6.5). 
The  resulting  proportion  maps  are  fairly  similar  to  the  ones  resulting  from  CC-MMU  (figure  6.25). 
Hence,  the  same  conclusions  apply  here. 

6.6  Robust  Context  Dependent  Spectral  Unmixing 

The  Robust  Context  Dependent  Spectral  Unmixing  (RCDSU)  is  a  variation  of  CDSU  that 
can  handle  noisy  data.  It  is  based  on  the  use  of  possibilistic  membership  functions  along  with  the 
fuzzy  membership  functions  of  CDSU.  The  fuzzy  memberships  are  used  to  partition  the  spectra 
space  into  multiple  convex  sets  that  span  the  entire  space  and  avoid  coincident  clusters,  while  the 
possibilistic  memberships  are  used  to  reduce  the  effect  of  noise  and  obtain  robust  estimates  of  the 
endmembers  and  proportions  within  each  cluster.  In  all  results  reported  in  this  chapter,  we  update 
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the  parameters  rji  every  iteration  using  [85] 

rji  =  mean{costij ,  iy  >  tf},  (6-7) 

where  costij  is  given  by  (3.31),  and  tf  is  the  Qth  percentile  of  the  typicalities  in  cluster  i.  Q  can 
be  thought  of  as  the  percentage  of  points  not  belonging  to  cluster  i.  In  this  experiment,  we  let 
Q  =  100  —  Furthermore,  the  convergence  of  the  algorithm  is  checked  by  comparing  the  values 
of  the  objective  function  from  successive  iterations.  If  the  difference  is  below  some  threshold,  the 
algorithm  is  stopped. 

We  design  the  following  experiment  so  that  we  evaluate  the  ability  of  RCDSU  to  handle 
noisy  data.  We  use  the  Usgs2C2M3  simulated  hyperspectral  data  sets  described  in  Section  6. 1.1. 2. 

We  run  P-COMMEND,  CDSUm  and  RCDSU  25  times  using  the  parameters  in  table  6.7. 
When  the  data  is  expected  to  be  noisy,  we  set  a  higher  weight  to  the  possibilistic  memberships 
( b  =  0.9)  compared  to  the  fuzzy  memberships  (a  =  0.1).  Otherwise,  we  set  a  =  0.9  and  b  =  0.1. 

TABLE  6.7 

Parameters  used  for  P-COMMEND,  CDSUm  and  RCDSU  on  the  Usgs2C2M3  data 


C 

M 

m 

n  a  b  Pi,  Vi 

> 

b 

a 

P-COMMEND 

2 

3 

1.5 

N/A 

i 

CDSUm 

2 

3 

1.5 

N/A 

1 

1 

100 

RCDSU 

2 

3 

1.5 

1.5  0.1  0.9 

1 

1 

100 

The  estimated  endmembers  from  all  algorithms  are  compared  to  the  true  endmembers  using 
the  spectral  mean  angle  error  (SMAE)  defined  in  equation  (6.4).  We  report  the  average  and  standard 
deviation  of  the  SMAE  of  the  resulting  endmember  estimates  from  all  algorithms  in  all  runs  in  table 
6.8.  A  two-sample  t-test  at  the  5%  significance  level  shows  that,  for  noisy  data,  RCDSU  provides 
significantly  better  endmember  estimates  than  P-COMMEND  and  CDSUm  (p- value  <  le-20).  In 

TABLE  6.8 

Average  (±  standard  deviation)  of  the  SMAE  over  25  runs  for  P-COMMEND,  CDSUm  and  RCDSU 
on  the  Usgs2C2M3  data 


%  of  noise  points 

0 

5 

10 

P-COMMEND 

0.0630  (±0) 

0.1609  (±0.0046) 

0.1644  (±0.0135) 

CDSUm 

0.0630  (±0) 

0.1609  (±0.0046) 

0.1662  (±0.0119) 

RCDSU 

0.0639  (±0) 

0.0681  (±0.0074) 

0.0757  (±0.0105) 

figure  6.27,  we  illustrate  the  estimated  (dashed  line)  versus  the  true  (solid  line)  endmembers  for 
CDSUm  and  RCDSU  for  the  Usgs2C2M3  data  with  5%  noise  points.  As  it  can  be  seen,  the 
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(a)  CDSUm 


(b)  RCDSU 


Figure  6.27:  True  (solid  line)  and  estimated  (dashed  line)  endnrembers  of  CDSUm  and  RCDSU  for 
the  Usgs2C2M3  data  with  5%  noise  points. 

RCDSU  was  not  influenced  by  the  presence  of  the  noise  points  unlike  CDSUm  which  got  affected 
and  resulted  in  erroneous  estimates. 

To  verify  the  ability  of  RCDSU  to  identify  noise  points,  after  convergence,  we  identified 
points  with  small  (<  0.1)  possibilistic  memberships  in  all  clusters.  All  of  these  points  correspond  to 
the  points  with  added  noise. 
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Figure  6.28:  Initial  cluster  assignments  on  the  Usgs2C2M3  data  with  5%  noise  points  using  RCDSU 
with  Cmax  =  5. 

6.7  Unsupervised  Robust  Context  Dependent  Spectral  Unmixing 

So  far,  we  assumed  that  the  number  of  contexts  is  known.  If  this  is  not  the  case,  we  use 
the  proposed  Unsupervised  Robust  Context  Dependent  Spectral  Unmixing  (U-RCDSU).  U-RCDSU 
takes  advantage  of  the  fact  that  the  possibilistic  membership  functions  are  not  constrained  to  sum 
to  one,  and  hence,  small  clusters  covering  the  same  dense  regions  would  expand  and  become  similar. 
These  are  then  merged  and  the  number  of  clusters,  initially  overspecified,  is  reduced  until  it  reaches 
the  optimal  number.  In  the  following,  we  evaluate  the  ability  of  U-RCDSU  to  determine  the  correct 
number  of  contexts  using  simulated  and  real  data. 

6.7.1  Evaluation  using  simulated  data 

We  use  the  same  data  (with  5%  noise  points)  and  parameters  of  Section  6.6  and  we  run 
U-RCDSU  by  overspecifying  the  number  of  clusters  to  Cmax  =  5.  We  let  e  =  0.1.  The  initial  cluster 
assignments  are  shown  in  figure  6.28,  where  we  plot  the  2  principal  components  (PC)  of  the  data, 
labeling  each  point  with  its  cluster  assignment.  Figure  6.29  shows  which  clusters  got  merged  after 
each  iteration  of  the  U-RCDSU  algorithm.  Figure  6.30  illustrates  the  final  cluster  assignments  of 
the  data.  As  it  can  be  seen,  the  algorithm  converged  to  C  =  2.  The  resulting  endmembers  are  shown 
in  figure  6.31.  As  it  can  be  seen,  the  identified  endmembers  are  unaffected  by  the  noise  points,  and 
are  similar  to  the  results  of  RCDSU  in  figure  6.27(b). 
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Figure  6.29:  Cluster  merging  on  the  Usgs2C2M3  data  with  5%  noise  points  using  U-RCDSU. 


Figure  6.30:  Final  cluster  assignments  on  the  Usgs2C2M3  data  with  5%  noise  points  using  U- 
RCDSU. 

6.7.2  Evaluation  using  real  data 

We  use  two  real  hyperspectral  data  sets:  the  Pavia  University  and  the  University  of 
Southern  Mississippi  data. 

6. 7. 2.1  Pavia  University  data 

We  use  a  down  sampled  version,  of  size  204  x  114  pixels,  of  the  Pavia  University  data 
described  in  Section  6.1.2.  We  run  the  unsupervised  RCDSU  algorithm  using  Cmax  =  10,  M  =  3, 
m  =  n  =  1.5,  a  =  b  =  0.5,  a  =  100,  /3*  =  4,  Vi,  <7j  =  1,  Vi  and  e  =  0.1.  The  algorithm  converged 
to  C  =  2.  Figure  6.32  shows  the  evolution  of  the  cluster  assignments  of  the  data  points  as  U- 
RCDSU  runs.  We  plot  2  principal  components  (PC)  of  the  data,  labeling  each  point  with  its  cluster 
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Figure  6.31:  True  (solid  line)  and  estimated  (dashed  line)  endmembers  for  the  Usgs2C2M3  data 
with  5%  noise  points  using  U-RCDSU. 

assignment.  Figure  6.32(a)  shows  the  cluster  assignments  after  running  RCDSU  with  a  =  b  =  0.5 
and  Cmax  =  10.  Figure  6.32(b)  shows  the  cluster  assignments  after  running  RCDSU  with  a  =  0, 
6  =  1  and  Cmax  =  10.  Figure  6.32(c)  shows  the  cluster  assignments  after  one  more  iteration  of 
U-RCDSU.  As  it  can  be  seen,  we  went  from  10  to  4  clusters.  Finally,  figure  6.32(d)  shows  the  final 
cluster  assignments  after  convergence  (1  more  iteration).  We  went  from  4  to  2  clusters. 

Since  the  ground  truth  is  not  available  for  the  entire  scene,  we  evaluate  the  performance 
of  the  algorithm  qualitatively  by  displaying  the  resulting  proportion  maps  and  interpreting  them 
based  on  the  RGB  image  of  the  scene  shown  in  figure  6.3(a).  The  proportion  maps  associated  with 
the  three  endmembers  for  each  of  the  two  clusters  are  shown  in  figure  6.33.  The  proportions  were 
multiplied  by  the  corresponding  weighted  cluster  memberships  (aUij  +  btjj )  in  order  to  highlight 
pixels  from  that  cluster.  Dark  blue  represents  small  values  and  dark  red  represents  large  values. 
Each  proportion  map  is  labeled  with  the  dominant  material  it  represents. 

It  can  be  seen  that  U-RCDSU  resulted  in  two  intuitive  clusters  in  the  sense  that  one  cluster 
corresponds  to  natural  materials,  and  the  other  corresponds  to  man-made  materials.  The  three 
proportion  maps  of  context  1  (figure  6.33(a))  correspond  to  vegetation,  shadows  and  bare  soil,  re¬ 
spectively.  These  materials  represent  natural  regions  in  the  scene.  The  three  proportion  maps  of 
context  2  (figure  6.33(b))  correspond  to  brick  roofs,  metal  roofs,  and  asphalt  and  bitumen,  respec¬ 
tively.  They  represent  urban  regions  in  the  scene.  One  may  conclude  that  U-RCDSU  identified 
a  reasonable  number  of  contexts  with  coherent  content  and  appropriate  endmembers  within  each 
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(c)  Assignments  after  1  iteration.  (d)  Assignments  after  convergence  (iteration  2). 

Figure  6.32:  Evolution  of  the  cluster  assignments  using  U-RCDSU  with  Cmax  =  10  on  the  Pavia 
University  data. 

context. 

6. 7. 2. 2  University  of  Southern  Mississippi  data 

We  run  the  unsupervised  RCDSU  algorithm  on  the  other  real  data,  the  University  of 
Southern  Mississippi  hyperspectral  data  described  in  Section  6.1.2.  We  use  a  down  sampled 
version  of  size  163  x  169  pixels.  We  let  Cmax  =5,  M  =  3,  m  =  n  =  1.5,  a  =  b  =  0.5,  a  =  100, 
Pi  =  1,  Vi,  cj i  =  1,  Vi  and  e  =  0.2.  The  algorithm  converged  to  C  =  2.  Since  the  ground 
truth  is  not  available  for  this  scene,  we  evaluate  the  performance  of  the  algorithm  qualitatively  by 
displaying  the  resulting  proportion  maps  and  interpreting  them  based  on  the  RGB  image  of  the 
scene  shown  in  figure  6.24(a).  Each  proportion  map  was  multiplied  by  the  corresponding  weighted 
memberships  ( auij+btij ).  Figure  6.34  illustrates  these  maps  where  the  title  of  each  one  represents  the 
corresponding  dominant  materials.  For  this  data  also,  the  U-RCDSU  identified  a  reasonable  number 
of  contexts  with  coherent  content.  The  endmembers  in  cluster  1  correspond  to  grass,  asphalt  and 
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Grass  and  trees  Shadows 


Bare  soil 


Brick  roofs 


(a)  Proportion  maps  in  context  1 

Metal  roofs 


Asphalt  and  bitumen 


(b)  Proportion  maps  in  context  2 

Figure  6.33:  Proportion  maps  from  U-RCDSU  on  the  Pavia  University  data.  Rows  correspond  to 
clusters  and  columns  correspond  to  endmembers. 


84 


Grass 


Asphalt  +  Bitumen 


Beach  sand 


Bare  soil 
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Trees 


Figure  6.34:  Proportion  maps  from  U-RCDSU  on  the  University  of  Southern  Mississippi  data. 
Rows  correspond  to  clusters  and  columns  correspond  to  endmembers. 


bitumen,  and  beach  sand.  In  cluster  2,  the  endmembers  represent  bare  soil,  shadows  and  trees. 


6.8  Robust  Unmixing  Using  Consensus  Analysis 

Robust  Unmixing  using  Consensus  Analysis  is  based  on  the  idea  that  “optimal”  endmembers 
in  the  data  will  have  a  consensus  among  multiple  unmixing  results.  The  method  combines  results 
from  multiple  unmixing  algorithms  with  different  numbers  of  endmembers  to  find  the  optimal  end- 
members  using  consensus  analysis.  The  approach  overcomes  the  weaknesses  of  individual  algorithms 
and  provides  a  robust  alternative  to  estimating  the  endmembers  in  the  data.  First,  we  illustrate  the 
results  of  the  proposed  approach  on  the  Usgs3ClM4  simulated  data  described  in  Section  4.1.  Then, 
we  present  the  results  on  a  real  data.  The  pixels’  proportion  values,  for  the  unmixing  algorithms 
that  return  endmembers  only,  are  computed  through  a  least  squares  constrained  minimization  using 
Lagrange  Multipliers  optimization. 
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Figure  6.35:  iVAT  image  of  77a  of  the  Usgs3ClM4  data. 


6.8.1  Evaluation  using  simulated  data 

For  this  data  set,  we  set  Q  =  20,  ap  =  0.85  and  ac  =  95th  percentile  of  {C(j,  1)1  j  7^  l}-  This 
resulted  in  151  voters  (~  8%  of  the  data)  in  the  set  Ha-  First,  to  illustrate  the  fact  that  T~La  can  be 
easily  clustered,  we  use  the  Improved  Visual  Assessment  of  Cluster  Tendency  (iVAT)  [96]  algorithm 
to  visualize  it.  iVAT  takes  as  input  a  dissimilarity  matrix  D  and  reorders  its  elements  so  that  clusters 
can  be  visualized.  The  iVAT  image  is  shown  in  figure  6.35.  We  can  clearly  see  4  dark  blocks  on  the 
diagonal,  indicating  the  presence  of  4  clusters,  which  is  conform  to  the  true  number  of  endmembers 
used  to  generate  the  data.  In  figure  6.36,  we  scatter  plot  the  3  principal  components  (PC)  of  the 
data  (gray  dots)  and  the  voters  of  set  T~La  in  colored  dots.  Each  color  represents  one  cluster  of  voters. 
Here,  an  average  link  hierarchical  clustering  [90]  is  used  to  identify  the  4  clusters.  The  medoids  of 
the  clusters  are  selected  to  represent  the  extracted  endmembers.  These  are  illustrated  in  figure  6.36 
using  large  red  ’X’s.  The  large  green  ’X’s  represent  the  true  endmembers.  As  it  can  be  seen,  the 
extracted  endmembers  are  close  to  the  true  ones. 


6.8.2  Evaluation  using  real  data 

In  this  experiment,  we  use  a  down  sampled  version,  of  size  204x  114,  of  the  Pavia  University 
data  described  in  Section  6.1.2.  We  run  ICE  [61],  VCA  [66],  PPI  [59],  and  N-FINDR  [60]  using  a 
number  of  endmembers  varying  from  5  to  10,  repeating  each  run  4  times  (giving  a  total  of  96  runs). 
We  also  run  U-RCDSU  using  Cmax  =  5  with  M  =  2  and  M  =  3  (the  other  parameters  were  the 
same  as  in  Section  6. 7. 2.1),  leading  to  4  contexts  with  2  endmembers  each,  and  2  contexts  with  3 
endmembers  each,  respectively.  In  total,  we  have  102  endmember  sets. 

We  set  Q  =  50,  ap  =  0.85  and  ac  =  90th  percentile  of  {C(j,l)\  j  ^  l}.  This  resulted  in 
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Figure  6.36:  The  PC  of  the  Usgs3ClM4  data,  the  voters,  the  extracted  endmembers  and  the  true 
endmembers. 


1804  voters  (~  8%  of  the  data)  in  the  set  Ha-  The  iVAT  image  for  these,  illustrated  in  figure 
6.37,  shows  5  blocks  along  the  diagonal.  An  average  link  hierarchical  clustering  algorithm  identified 
the  5  clusters.  We  pick  the  medoid  of  each  cluster  as  the  representative  of  the  endmember.  We 
compute  the  proportion  values  of  these  endmembers  in  all  pixels  and  display  them  as  proportion 
maps  in  figure  6.38.  The  values  are  shown  as  a  heat  map  (dark  blue  represents  low  values  and 
dark  red  represents  high  values),  and  each  proportion  map  is  labeled  with  the  dominant  material  it 
represents.  It  can  be  seen  that  these  proportion  maps  are  similar  to  the  ones  in  figure  6.33  resulting 
from  U-RCDSU,  with  the  exception  that  U-RCDSU  resulted  in  separate  endmembers  for  shadows 
and  roads,  as  opposed  to  only  one  by  the  consensus  unmixing. 
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Figure  6.37:  iVAT  image  of  Ha  of  the  Pavia  University  data. 
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(a)  RGB 


(b)  ei=  “Vegetation” 


(c)  e2=  “Shadows+Roads” 


(d)  e3=  “Meadows”  (e)  e4=  “Metal  roofs”  (f)  e$=  “Bare  soil” 

Figure  6.38:  RGB  and  proportion  maps  of  the  Pavia  University  data  using  consensus  analysis. 


Given  the  ground  truth  data  of  some  areas  in  the  scene,  in  table  6.9,  we  report  the  classes 
of  the  labeled  points  having  proportion  values  greater  than  a  threshold  t.  For  each  endmember,  we 
show  the  threshold  t  and  the  number  n  of  these  points.  We  compare  the  proportions  generated  by 
the  consensus  approach  to  those  generated  by  N-FINDR,  VCA,  ICE  and  PPI,  using  5  endmembers. 


From  table  6.9,  we  notice  that  some  of  the  selected  individual  unmixing  methods  resulted 


88 


TABLE  6.9 


Comparison  of  high  proportion  pixels  (in  %)  among  ground  truth  classes  using  consensus  unmixing,  N- 
FINDR,  VGA,  ICE  and  PPI  on  the  Pavia  University  data 


~~  - — -  Class 

Endmembers  - 

Asphalt 

Meadows 

Trees 

Metal 

sheets 

Berne 

soil 

Shadows 

Self-blocking 

bricks 

Gravel 

Bitumen 

Consensus  unmixing 

ei 

“Vegetation”  (Fig.  6.38b),  t=0.5,  n=638 

0  % 

52.66  % 

47.18  % 

0  % 

0.16  % 

0  % 

0% 

0  % 

0  % 

e2 

“Shadows  +  Roads”  (Fig.  6.38c),  i=0.7,  n=141 

24.82  % 

0  % 

0% 

0  % 

0  % 

75.18  % 

0% 

0% 

0  % 

e3 

“Meadows”  (Fig.  6.38d),  t=0.6,  n=182 

1.10  % 

85.16  % 

0% 

0% 

13.19  % 

0% 

0% 

0  % 

0.55  % 

e>i 

“Metal  roofs”  (Fig.  6.38e),  t= 0.5,  n=135 

0% 

0  % 

0% 

100  % 

0  % 

0% 

0% 

0  % 

0  % 

e5 

“Bare  soil”  (Fig.  6.38f),  £=0.4,  n=113 

6.19  % 

0.88  % 

0% 

0  % 

84.07  % 

0% 

3.54  % 

5.31  % 

0  % 

N-FINDR 

ei 

“Bare  soil”,  £=0.2,  n=266 

6.77  % 

18.05  % 

0% 

0  % 

45.86  % 

0% 

15.04  % 

14.29  % 

0% 

e2 

“Metal  roofs”,  £=0.2,  n= 78 

0% 

0  % 

0% 

100  % 

0  % 

0% 

0% 

0  % 

0  % 

e3 

“Metal  roofs”,  £=0.4,  n=66 

0% 

0  % 

0  % 

100  % 

0  % 

0% 

0% 

0  % 

0  % 

e>i 

“Vegetation”,  £=0.6,  n=454 

0% 

50.88  % 

49.12  % 

0% 

0  % 

0% 

0% 

0  % 

0  % 

e5 

“Shadows  +  Roads”,  £=0.8,  rc=166 

35.54  % 

0  % 

0% 

0% 

0  % 

64.46  % 

0% 

0  % 

0  % 

VCA 

ei 

“Metal  roofs”,  £=0.5,  n=144 

0  % 

0  % 

0% 

100  % 

0  % 

0% 

0% 

0  % 

0  % 

e2 

“Vegetation”,  £=0.9,  n=473 

0% 

39.53  % 

60.47  % 

0% 

0  % 

0% 

0% 

0  % 

0% 

e3 

“Shadows”,  £=0.5,  n=104 

0% 

0  % 

0% 

0% 

0  % 

100  % 

0% 

0  % 

0  % 

e4 

“Bricks  +  Bare  soil”,  £=0.6,  n= 727 

5.23  % 

12.65  % 

0.14  % 

0.28  % 

21.46  % 

0% 

42.92  % 

17.33  % 

0  % 

e5 

“Roads” ,  £=0.3,  n=226 

94.69  % 

0  % 

0% 

0% 

0  % 

3.98  % 

0.44  % 

0  % 

0.88  % 

ICE 

ei 

“Shadows  +  Roads”,  £=0.4,  n=396 

53.03  % 

4.04  % 

0% 

0  % 

6.31  % 

26.77  % 

2.27  % 

2.27  % 

5.3% 

e2 

“Vegetation”,  £=0.6,  n=458 

0% 

51.31  % 

48.69  % 

0% 

0  % 

0% 

0% 

0% 

0% 

e3 

“Bare  soil”,  £=0.25,  n=100 

7% 

8% 

0% 

0% 

75  % 

0% 

4% 

6% 

0  % 

e4 

“Shadows  +  Roads”,  £=0.4,  n=165 

31.52  % 

0  % 

0% 

4.24  % 

0  % 

64.24  % 

0% 

0  % 

0  % 

e5 

“Metal  roofs”,  £=0.5,  n=104 

0% 

0  % 

0% 

100  % 

0  % 

0% 

0% 

0  % 

0  % 

PPI 

ei 

“Not  clear”,  £=0.1,  n=0 

0% 

0  % 

0% 

0  % 

0  % 

0  % 

0% 

0  % 

0  % 

e2 

“Meadows”,  £=0.5,  n=3084 

9.82  % 

51.95  % 

0.23  % 

0.03  % 

17.67  % 

0.71  % 

11.87  % 

7.39  % 

0.32  % 

e3 

“Not  clear”,  t=0.1,  n=0 

0% 

0  % 

0% 

0% 

0  % 

0% 

0% 

0  % 

0  % 

e4 

“Vegetation”,  £=0.9,  n=144 

0% 

22.22  % 

77.08  % 

0% 

0.69  % 

0% 

0% 

0% 

0  % 

e5 

“Metal  roofs”,  £=0.6,  n=26 

7.69  % 

0  % 

0% 

92.31  % 

0  % 

0% 

0% 

0  % 

0  % 

in  duplicated  endmembers  (N-FINDR  and  ICE),  or  missed  some  endmembers  (PPI).  VC  A,  on  the 
other  hand,  resulted  in  appropriate  endmembers.  The  consensus  unmixing  overcame  the  “erro¬ 
neous”  results  of  some  runs  by  only  considering  the  endmembers  over  which  there  was  a  consensus 
among  the  different  runs.  As  a  result,  the  consensus  unmixing  identified  more  robust  and  consistent 
endmembers. 

6.9  Context  Dependent  Hyperspectral  Subpixel  Target  Detection 

We  designed  these  experiments  to  evaluate  the  performance  of  the  proposed  context  depen¬ 
dent  (CD)  target  detectors  and  compare  them  to  the  traditional  detectors  that  use  a  single  end- 
member  set  to  describe  the  background.  For  the  traditional  detectors,  we  use  the  MVSA,  NFINDR, 
PPI  and  the  eigenvectors  of  the  data  correlation  matrix  method,  to  find  a  set  of  3  or  6  endmembers 
(referred  to  as  MVSA3  and  MVSA6,  NFINDR3  and  NFINDR6,  PPI3  and  PPI6,  and  EigVect3  and 
EigVect6  respectively) . 

6.9.1  Evaluation  using  implanted  targets 

We  use  the  Indian  Pines  data  described  in  Section  6.1.2.  We  implant  one  hundred  spectral 
signatures  of  a  red  tarp  target  with  abundance  values  ranging  from  0.1  to  1.  Figure  6.39  shows  band 
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Figure  6.39:  Band  115  of  the  Indian  Pines  data  with  100  implanted  targets. 


115  (~  1.5//m)  of  the  image  where  we  can  see  the  grid  of  100  targets.  The  top  row  consists  of  the 
targets  with  abundance  values  of  0.1.  These  are  difficult  to  see  since  they  have  low  proportion  (90% 
of  the  pixel  is  background).  The  next  row  consists  of  the  targets  with  abundance  values  of  0.2.  This 
pattern  continues  till  the  last  row  which  corresponds  to  pure  targets  of  abundance  values  of  1.  We 
run  U-RCDSU  using  Cmax  =  5,  M  =  3,  m  =  n  =  1.5,  a  =  0.1,  b  =  0.9,  a  =  100,  /?*  =  1,  Vi, 
a*  =  1,  Vi  and  e  =  0.1.  The  algorithm  converges  to  C  =  2  clusters. 

We  start  by  analyzing  the  resulting  possibilistic  memberships  of  the  target  pixels  in  both 
clusters.  Figure  6.40  shows  a  scatter  plot  of  the  maximum  possibilistic  memberships  of  the  targets  in 
both  clusters  as  a  function  of  their  proportions  in  the  pixels.  All  proportions  are  very  small  (<  0.06), 
which  means  that  the  targets  did  not  contribute  to  the  estimated  local  background  subspaces.  In 
other  words,  there  was  no  leakage  of  targets  into  the  endmember  sets.  We  can  also  notice  that  the 
larger  the  proportion  of  the  target,  the  smaller  its  possibilistic  membership,  which  is  expected.  These 
memberships  could  be  used  on  their  own  to  detect  targets.  However,  this  would  result  in  a  large 
number  of  false  alarms,  since  the  memberships  cannot  discriminate  between  targets  and  non-target 
outliers. 

In  figure  6.41,  we  scatter  plot  the  detection  statistic  of  CD-OSP  as  a  function  of  the  target 
proportion  in  the  pixel.  We  notice  that  the  statistic  increases  as  the  target  proportion  increases, 
which  is  expected. 

The  performance  of  the  detectors  is  evaluated  using  the  receiver  operating  characteristic 
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Figure  6.40:  Scatter  plot  of  the  possibilistic  memberships  of  the  targets  as  a  function  of  their 
proportions  in  the  pixels  (Indian  Pines  data). 
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Figure  6.41:  Scatter  plot  of  the  CD-OSP  statistic  as  a  function  of  the  target  proportion  in  the  pixel 

(Indian  Pines  data). 
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(a)  OSP  detector 


(b)  AMSD  detector 


Figure  6.42:  Receiver  operating  characteristic  (ROC)  curves  (Indian  Pines  data). 


(ROC)  curves.  In  figure  6.42,  we  plot  the  ROC  curves  of  the  proposed  context  dependent  detectors 
versus  those  of  the  traditional  detectors  using  MVSA.  It  can  be  seen  from  figure  6.42  that  the 
proposed  context  dependent  target  detectors  outperform  the  traditional  MVSA  single  subspace 
detectors.  We  also  evaluate  the  performance  of  the  detectors  using  the  Area  Under  the  ROC  Curve 
(AUC).  Table  6.10  reports  the  average  and  standard  deviation  of  this  measure  for  the  CD,  EigVect, 
MVSA,  NFINDR  and  PPI  approaches  using  the  OSP,  AMSD  and  HSD  detectors  over  25  runs. 

Using  a  two-sample  t-test  at  the  5%  significance  level,  we  conclude  that  the  proposed  context 
dependent  detection  approach  outperformed  the  traditional  detection  approaches  when  the  OSP  and 
AMSD  detectors  were  used  {p- value  <  0.003).  For  HSD,  the  correlation  matrix  eigen  vectors  based 
method  outperformed  the  rest  of  the  methods  (p- value  <  le-10).  However,  the  context  dependent 
approach  still  outperformed  the  other  endmember  detection  methods.  This  can  be  explained  by  the 
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TABLE  6.10 


Average  (±  standard  deviation)  of  the  AUC  over  25  runs  for  the  CD,  EigVect,  MVSA,  NFINDR 
and  PPI  methods  (Indian  Pines  data) 


Detector 

OSP 

AMSD 

HSD 

CD 

0.9921  (±0.0000) 

0.9443  (±0.0507) 

0.9977  (±0.0000) 

EigVect3 

0.9822  (±0.0000) 

0.8916  (±0.0000) 

0.9978  (±0.0000) 

EigVect6 

0.9822  (±0.0000) 

0.8916  (±0.0000) 

0.9978  (±0.0000) 

MVSA3 

0.9426  (±0.0000) 

0.8928  (±0.0500) 

0.9725  (±0.0000) 

MVSA6 

0.9656  (±0.0000) 

0.8992  (±0.0476) 

0.9537  (±0.0001) 

NFINDR3 

0.8270  (±0.0566) 

0.4652  (±0.1206) 

0.6976  (±0.0006) 

NFINDR6 

0.6502  (±0.4482) 

0.5316  (±0.3951) 

0.7417  (±0.0368) 

PPI3 

0.3911  (±0.3438) 

0.5000  (±0.0000) 

0.1792  (±0.0590) 

PPI6 

0.5314  (±0.4465) 

0.5313  (±0.3947) 

0.3357  (±0.2581) 

better  description  of  the  background  using  the  RCDSU  algorithm,  which  also  ensures  that  there  is  no 
leakage  of  targets  into  the  endmember  sets  using  the  possibilistic  memberships.  We  also  notice  the 
robustness  of  the  context  dependent  approach  to  initialization  (AUC  standard  deviation  <  0.051). 

6.9.2  Evaluation  using  actual  targets 

The  University  of  Southern  Mississippi  data  described  in  Section  6.1.2  presents  differ¬ 
ent  targets  with  different  sizes.  Figure  6.43  shows  the  subimage  (67675  pixels)  considered  in  this 
experiment,  along  with  the  target  types,  locations  and  sizes  (the  circle  sizes  are  relative  to  the  tar¬ 
get  sizes).  The  targets  were  made  of  100%  cotton  fabric  and  were  emplaced  so  that  there  would  be 
representatives  of  each  color  type  completely  non-occluded,  partially  occluded,  and  almost  totally 
occluded.  As  a  consequence,  one  can  extract  pure  pixels  of  most  of  the  colors  but  it  is  unlikely  that 
any  algorithm  will  be  able  to  find  all  the  targets.  Each  target  was  assigned  a  confidence  number  (1- 
Visible,  2-  Probably  the  target,  3-  Possibly  the  target,  4-  Not  visible)  and  a  category  (0-  Target  not 
covered,  1-  Covered  partly  or  fully  with  shadow  but  no  tree,  2-  Part  or  all  of  the  target  is  covered 
by  a  tree  branch).  Table  6.11  summarizes  this  information  about  the  targets.  We  run  U-RCDSU 
using  Cmax  =  5,  M  =  3,  m  =  n  =  1.5,  a  =  0.1,  b  =  0.9,  a  =  100,  ft  =  4,  Vi,  ct,;  =  1,  Vi  and  e  =  0.1. 
The  algorithm  converges  to  C  =  2  clusters. 

We  evaluate  the  performance  of  the  detection  methods  using  the  area  under  the  ROC  curve 
measure.  Table  6.12  reports  this  measure  for  all  methods  for  the  “brown”  targets.  Two  cases  are 
analyzed:  all  target  sizes  combined  and  subpixel  targets  only.  For  all  target  sizes  combined,  we 
can  see  that  the  proposed  context  dependent  approach  gave  good  AUCs,  outperforming  the  other 
methods  when  the  AMSD  and  HSD  detectors  were  used  [p- value  <  0.02).  For  the  subpixel  targets, 
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TABLE  6.11 


Targets  in  the  University  of  Southern  Mississippi  data 


Target 

Size 

Confidence 

Category 

Total 

0.5 

1 

3 

1 

2 

3 

4 

0 

1 

2 

Brown 

5 

5 

5 

2 

3 

1 

9 

4 

3 

8 

15 

Dark  green 

5 

5 

5 

2 

0 

1 

12 

3 

2 

10 

15 

Faux  vineyard  green 

5 

5 

5 

3 

3 

3 

6 

4 

2 

9 

15 

Pea  green 

5 

5 

5 

2 

2 

3 

8 

4 

1 

10 

15 

Total 

20 

20 

20 

9 

8 

8 

35 

15 

8 

37 
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100 

150 

200 

260 

300 

Figure  6.43:  Subimage  of  the  University  of  Southern  Mississippi  data  with  target  types,  loca¬ 
tions  and  sizes 

the  context  dependent  target  detectors  gave  the  best  AUC  when  used  with  HSD  (p- value  <  0.05). 

Table  6.13  reports  this  measure  for  all  methods  for  the  “dark  green”  targets,  when  all  target 
sizes  are  considered  and  when  subpixel  targets  only  are  considered.  For  all  target  sizes  combined,  we 
can  see  that  the  proposed  context  dependent  approach  gave  good  AUCs,  outperforming  the  other 
methods  when  the  OSP  and  HSD  detectors  were  used  (p- value  <  0.006).  For  the  subpixel  targets, 
MVSA3  outperformed  all  other  methods  (p-value  <  0.003). 

Table  6.14  reports  this  measure  for  all  methods  for  the  “faux  vineyard  green”  targets.  For  all 
target  sizes  combined,  we  can  see  that  the  proposed  context  dependent  approach  gave  good  AUCs, 
outperforming  the  other  methods  when  the  HSD  detector  is  used  (p- value  <  0.005).  However, 
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TABLE  6.12 


Average  (±  standard  deviation)  of  the  AUC  over  25  runs  for  the  CD,  EigVect,  MVSA,  NFINDR 
and  PPI  methods  (University  of  Southern  Mississippi  data,  brown  target) 


Detector 

OSP 

AMSD 

HSD 

AUC  for  all  target  sizes 

CD 

0.9657  (±0.0000) 

0.9812  (±0.0000) 

0.9778  (±0.0000) 

EigVect3 

0.8665  (±0.0000) 

0.9365  (±0.0000) 

0.9107  (±0.0000) 

EigVect6 

0.9665  (±0.0000) 

0.9365  (±0.0000) 

0.9107  (±0.0000) 

MVSA3 

0.9652  (±0.0000) 

0.9788  (±0.0000) 

0.9671  (±0.0214) 

MVSA6 

0.9664  (±0.0000) 

0.9697  (±0.0000) 

0.9429  (±0.0128) 

NFINDR3 

0.7892  (±0.0000) 

0.8328  (±0.0000) 

0.7951  (±0.0037) 

NFINDR6 

0.9252  (±0.0257) 

0.9359  (±0.0186) 

0.8580  (±0.0938) 

PPI3 

0.8505  (±0.0506) 

0.9011  (±0.0382) 

0.8800  (±0.0086) 

PPI6 

0.8741  (±0.0095) 

0.9301  (±0.0058) 

0.8709  (±0.0001) 

AUC  for  target  size  0.5 

CD 

0.9773  (±0.0000) 

0.9854  (±0.0000) 

0.9891  (±0.0000) 

EigVect3 

0.9229  (±0.0000) 

0.9795  (±0.0000) 

0.9727  (±0.0000) 

EigVect6 

0.9229  (±0.0000) 

0.9795  (±0.0000) 

0.9727  (±0.0000) 

MVSA3 

0.9779  (±0.0000) 

0.9846  (±0.0000) 

0.9654  (±0.0000) 

MVSA6 

0.9817  (±0.0000) 

0.9774  (±0.0000) 

0.9659  (±0.0001) 

NFINDR3 

0.8656  (±0.0000) 

0.8979  (±0.0000) 

0.9823  (±0.0000) 

NFINDR6 

0.9765  (±0.0173) 

0.9884  (±0.0072) 

0.9190  (±0.1225) 

PPI3 

0.8973  (±0.0424) 

0.9243  (±0.0509) 

0.9414  (±0.0070) 

PPI6 

0.8805  (±0.0151) 

0.9723  (±0.0052) 

0.9318  (±0.0009) 

MVSA6  with  AMSD  gave  the  highest  AUC.  For  subpixel  targets,  MVSA3  with  HSD  gave  the 
highest  AUC  (p-value  <  0.003). 

Table  6.15  reports  this  measure  for  all  methods  for  the  “pea  green”  targets.  When  all  target 
sizes  are  considered,  we  can  see  that  the  proposed  context  dependent  approach  gave  good  AUCs, 
however  not  as  good  as  the  MVSA3  with  AMSD  and  HSD,  and  MVSA6  with  OSP  (jp- value  <  le-9). 
For  subpixel  targets,  the  context  dependent  target  detection  yielded  the  highest  AUC  when  HSD 
was  used  (p- value  <  0.01). 

We  can  also  notice  that  the  context  dependent  target  detection  approach  is  robust  to  ini¬ 
tialization  (very  small  standard  deviation). 

In  contrast  to  the  Indian  Pines  data  with  implanted  targets,  the  context  dependent  approach 
to  target  detection  was  not  always  the  best  compared  to  the  traditional  detectors.  The  nature  of 
the  targets  may  explain  this.  In  fact,  most  of  the  targets  (45  out  of  60)  are  covered  partly  or  fully 
whether  by  shadows  of  trees.  Moreover,  only  9  out  of  60  targets  are  labeled  as  ’’visible”  in  the 
ground  truth.  Most  of  the  remaining  ones  (35  out  of  51)  are  not  visible.  So,  detecting  these  would 
be  difficult.  Actually,  if  detected,  these  should  be  considered  as  false  alarms.  Unfortunately,  the 
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TABLE  6.13 


Average  (±  standard  deviation)  of  the  AUC  over  25  runs  for  the  CD,  EigVect,  MVSA,  NFINDR 
and  PPI  methods  (University  of  Southern  Mississippi  data,  dark  green  target) 


Detector 

OSP 

AMSD 

HSD 

AUC  for  all  target  sizes 

CD 

0.9533  (±0.0000) 

0.9760  (±0.0000) 

0.9781  (±0.0000) 

EigVect3 

0.9171  (±0.0000) 

0.9251  (±0.0000) 

0.9341  (±0.0000) 

EigVect6 

0.9171  (±0.0000) 

0.9251  (±0.0000) 

0.9341  (±0.0000) 

MVSA3 

0.9510  (±0.0000) 

0.9795  (±0.0000) 

0.9558  (±0.0000) 

MVSA6 

0.9149  (±0.0000) 

0.9518  (±0.0000) 

0.9165  (±0.0000) 

NFINDR3 

0.8874  (±0.0000) 

0.8649  (±0.0000) 

0.8676  (±0.0000) 

NFINDR6 

0.9088  (±0.0202) 

0.9224  (±0.0161) 

0.8370  (±0.0392) 

PPI3 

0.9418  (±0.0199) 

0.9672  (±0.0040) 

0.9189  (±0.0173) 

PPI6 

0.9179  (±0.0143) 

0.9569  (±0.0089) 

0.8978  (±0.0030) 

AUC  for  target  size  0.5 

CD 

0.9620  (±0.0000) 

0.9679  (±0.0000) 

0.9829  (±0.0000) 

EigVect3 

0.9132  (±0.0000) 

0.9693  (±0.0000) 

0.9765  (±0.0000) 

EigVect6 

0.9132  (±0.0000) 

0.9693  (±0.0000) 

0.9765  (±0.0000) 

MVSA3 

0.9634  (±0.0000) 

0.9758  (±0.0000) 

0.9858  (±0.0000) 

MVSA6 

0.9309  (±0.0000) 

0.9493  (±0.0000) 

0.9453  (±0.0000) 

NFINDR3 

0.9342  (±0.0000) 

0.9197  (±0.0000) 

0.9397  (±0.0000) 

NFINDR6 

0.9399  (±0.0065) 

0.9467  (±0.0088) 

0.8907  (±0.1154) 

PPI3 

0.9629  (±0.0028) 

0.9737  (±0.0086) 

0.9485  (±0.0019) 

PPI6 

0.9515  (±0.0148) 

0.9724  (±0.0012) 

0.9468  (±0.0011) 

number  of  visible  targets  for  each  type  is  not  enough  to  evaluate  the  methods  on  them  only.  This 
can  be  seen  in  figure  6.44  where  3  principal  components  (PC)  of  the  data  points  are  shown  in  blue 
dots  while  the  pixels  labeled  as  targets  are  shown  in  red  dots.  It  can  be  seen  that  most  of  the  targets 
are  located  in  the  background  cloud  and  that  only  few  of  them  are  “outliers”.  This  makes  them 
difficult  to  detect.  This  illustration  should  be  considered  with  a  grain  of  salt  as  it  only  represents 
the  data  in  a  reduced  dimension  space. 

Another  explanation  would  be  the  presence  of  many  outliers  in  the  data  (as  it  can  be  seen  in 
figure  6.44).  The  robust  unmixing  did  not  take  these  into  account  in  the  endmember  sets  formation. 
Hence,  they  do  not  fit  the  background  model  and  will  result  in  a  high  detection  score  as  opposed 
to  the  global  non  robust  unmixing  methods  which  consider  them  in  the  background  modeling,  and 
hence  not  detecting  them  as  false  alarms. 
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TABLE  6.14 


Average  (±  standard  deviation)  of  the  AUC  over  25  runs  for  the  CD,  EigVect,  MVSA,  NFINDR 
and  PPI  methods  (University  of  Southern  Mississippi  data,  faux  vineyard  green  target) 


Detector 

OSP 

AMSD 

HSD 

AUC  for  all  target  sizes 

CD 

0.9328  (±0.0000) 

0.9439  (±0.0000) 

0.9695  (±0.0000) 

EigVect3 

0.9078  (±0.0000) 

0.8917  (±0.0000) 

0.8416  (±0.0000) 

EigVect6 

0.9078  (±0.0000) 

0.8917  (±0.0000) 

0.8416  (±0.0000) 

MVSA3 

0.9317  (±0.0000) 

0.9378  (±0.0000) 

0.9639  (±0.0000) 

MVSA6 

0.9448  (±0.0000) 

0.9791  (±0.0000) 

0.9394  (±0.0000) 

NFINDR3 

0.8821  (±0.0000) 

0.8408  (±0.0000) 

0.8825  (±0.0000) 

NFINDR6 

0.9493  (±0.0160) 

0.9523  (±0.0226) 

0.9246  (±0.0714) 

PPI3 

0.8180  (±0.0659) 

0.9672  (±0.0040) 

0.8134  (±0.0533) 

PPI6 

0.9405  (±0.0034) 

0.9569  (±0.0089) 

0.9056  (±0.0016) 

AUC  for  target  size  0.5 

CD 

0.9668  (±0.0000) 

0.9708  (±0.0000) 

0.9891  (±0.0000) 

EigVect3 

0.9692  (±0.0000) 

0.9390  (±0.0000) 

0.9211  (±0.0000) 

EigVect6 

0.9692  (±0.0000) 

0.9390  (±0.0000) 

0.9211  (±0.0000) 

MVSA3 

0.9678  (±0.0000) 

0.9704  (±0.0000) 

0.9893  (±0.0000) 

MVSA6 

0.9585  (±0.0000) 

0.9792  (±0.0000) 

0.9428  (±0.0000) 

NFINDR3 

0.9376  (±0.0000) 

0.9038  (±0.0000) 

0.9679  (±0.0000) 

NFINDR6 

0.9677  (±0.0141) 

0.9706  (±0.0174) 

0.9842  (±0.0072) 

PPI3 

0.8465  (±0.0646) 

0.8363  (±0.0242) 

0.8937  (±0.0529) 

PPI6 

0.9727  (±0.0012) 

0.9186  (±0.0001) 

0.9362  (±0.0006) 

Figure  6.44:  Principal  components  of  the  University  of  Southern  Mississippi  data  (blue  dots) 
with  targets  (red  dots). 
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TABLE  6.15 

Average  (±  standard  deviation)  of  the  AUC  over  25  runs  for  the  CD,  EigVect,  MVSA,  NFINDR 
and  PPI  methods  (University  of  Southern  Mississippi  data,  pea  green  target) 


Detector 

OSP 

AMSD 

HSD 

AUC  for  all  target  sizes 

CD 

0.9563  (±0.0000) 

0.9672  (±0.0000) 

0.9693  (±0.0000) 

EigVect3 

0.9177  (±0.0000) 

0.7572  (±0.0000) 

0.7230  (±0.0000) 

EigVect6 

0.9177  (±0.0000) 

0.7572  (±0.0000) 

0.7230  (±0.0000) 

MVSA3 

0.9532  (±0.0000) 

0.9675  (±0.0000) 

0.9748  (±0.0000) 

MVSA6 

0.9663  (±0.0000) 

0.9245  (±0.0000) 

0.9480  (±0.0000) 

NFINDR3 

0.8604  (±0.0000) 

0.8264  (±0.0000) 

0.7816  (±0.0000) 

NFINDR6 

0.9582  (±0.0249) 

0.9426  (±0.0113) 

0.6097  (±0.1712) 

PPI3 

0.8200  (±0.0480) 

0.8519  (±0.0557) 

0.7358  (±0.0348) 

PPI6 

0.9657(±0.0003) 

0.9281  (±0.0002) 

0.7126  (±0.0000) 

AUC  for  target  size  0.5 

CD 

0.9741  (±0.0000) 

0.9472  (±0.0000) 

0.9769  (±0.0000) 

EigVect3 

0.9524  (±0.0000) 

0.7841  (±0.0000) 

0.7900  (±0.0000) 

EigVect6 

0.9524  (±0.0000) 

0.7841  (±0.0000) 

0.7900  (±0.0000) 

MVSA3 

0.9729  (±0.0000) 

0.9544  (±0.0000) 

0.9735  (±0.0000) 

MVSA6 

0.9663  (±0.0001) 

0.8925  (±0.0000) 

0.9469  (±0.0001) 

NFINDR3 

0.9344  (±0.0000) 

0.8400  (±0.0000) 

0.7774  (±0.0000) 

NFINDR6 

0.9672  (±0.0224) 

0.9351  (±0.0223) 

0.7299  (±0.1899) 

PPI3 

0.8586  (±0.0790) 

0.8628  (±0.0450) 

0.7956  (±0.0158) 

PPI6 

0.9766  (±0.0002) 

0.9316  (±0.0002) 

0.7892  (±0.0001) 
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CHAPTER  7 


CONCLUSIONS  AND  POTENTIAL  FUTURE  WORK 

7.1  Conclusions 

A  hyperspectral  unmixing  algorithm,  called  Context  Dependent  Spectral  Unmixing  (CDSU), 
that  finds  multiple  sets  of  endmembers  is  proposed.  Unlike  existing  hyperspectral  unmixing  meth¬ 
ods,  CDSU  is  a  local  approach  that  adapts  the  unmixing  to  different  regions  of  the  spectral  space. 
Consequently,  it  finds  multiple  sets  of  endmembers  that  represent  semantically  meaningful  regions 
of  the  hyperspectral  image. 

CDSU  is  based  on  defining  and  optimizing  a  novel  objective  function  that  combines  context  identifi¬ 
cation  and  hyperspectral  unmixing  into  a  joint  function.  This  function  models  contexts  as  compact 
clusters  and  uses  the  linear  mixing  model  as  the  basis  for  unmixing.  The  unmixing  provides  opti¬ 
mal  endmembers  and  abundances  for  each  context.  The  performance  of  CDSU  was  evaluated  and 
compared  to  similar  existing  methods  using  synthetic  and  real  data.  We  showed  that  the  proposed 
method  can  identify  meaningful  and  coherent  contexts,  and  appropriate  endmembers  within  each 
context.  We  also  showed  that  CDSU  is  more  robust  to  noise  than  similar  existing  methods. 

Several  extensions  of  the  CDSU  approach  were  also  proposed.  Due  to  the  high  dimensional 
and  correlated  nature  of  the  hyperspectral  data,  the  Euclidean  distance  used  in  CDSU  can  be  restric¬ 
tive  in  the  sense  that  it  limits  the  clusters  to  the  spherical  shape.  The  Context  Dependent  Spectral 
Unmixing  using  the  Mahalanobis  distance  (CDSUm)  was  proposed  to  overcome  this  limitation  and 
account  for  non-spherical  clusters.  This  was  achieved  by  using  the  Mahalanobis  distance  instead 
of  the  Euclidean  distance  in  the  clustering  component  of  CDSU.  This  allows  more  flexibility  in  the 
cluster  shapes  and  can  identify  ellipsoidal  clusters  where  the  variance  in  each  feature  dimension  is 
considered. 

CDSU  is  based  in  part  on  clustering.  However,  clustering  by  itself  is  a  challenging  task, 
especially  in  a  high  dimensional  space  as  is  the  case  of  hyperspectral  data.  Many  local  minima  solu¬ 
tions  may  exist.  To  overcome  this  problem,  we  proposed  two  semi-supervised  algorithms  of  CDSUm- 
These  methods  use  partial  supervision  information  to  guide  the  search  process  and  narrow  the  space 
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of  possible  solutions.  The  supervision  information  consists  of  small  sets  of  pairwise  constraints  which 
can  be  obtained  from  multiple  sources  of  information,  such  as  labeling  few  pixels  in  the  hyperspectral 
image  or  using  information  extracted  from  a  different  sensor.  The  first  semi-supervised  CDSU  algo¬ 
rithm,  called  Cluster  Constrained  Multi- Model  Unmixing  (CC-MMU),  uses  constraints  derived  from 
the  cluster  assignments  of  the  pixels.  The  second  algorithm,  called  Proportion  Constrained  Multi- 
Model  Unmixing  (PC-MMU),  uses  constraints  derived  from  the  proportion  values  of  the  pixels.  To 
validate  both  proposed  semi-supervised  CDSU  algorithms,  we  used  real  data  and  we  constructed  a 
set  of  constraints  using  information  provided  by  a  LIDAR  sensor,  as  well  as  information  extracted 
from  the  consensus  of  multiple  unmixing  algorithms. 

CDSU  is  sensitive  to  noise  and  outliers  present  in  the  hyperspectral  data  due  to  scene  and/or 
sensor  effects.  This  is  mainly  inherited  from  the  fuzzy  clustering  component  of  CDSU.  Noise  points 
affect  not  only  the  resulting  partition,  but  also  the  estimated  endmembers  and  proportions  within 
each  cluster.  Possibilistic  clustering  has  been  used  to  overcome  the  sensitivity  of  fuzzy  clustering  to 
noise.  This  approach  uses  possibilistic  memberships  to  identify  and  reduce  the  effect  of  noise  points. 
These  unconstrained  memberships  may  however  result  in  identical  clusters.  In  order  to  avoid  this 
problem,  we  proposed  a  Robust  Context  Dependent  Spectral  Unmixing  (RCDSU)  algorithm  that 
uses  both  fuzzy  and  possibilistic  memberships.  Fuzzy  memberships  are  used  to  partition  the  spectral 
space  into  multiple  sets  that  span  the  entire  space  and  avoid  coincident  clusters,  while  possibilistic 
memberships  are  used  to  reduce  the  effect  of  noise  and  obtain  robust  estimates  of  the  endmembers 
and  proportions  within  each  set. 

The  proposed  context  dependent  spectral  unmixing  algorithms,  like  other  multi-model  un¬ 
mixing  algorithms,  assume  that  the  optimal  number  of  endmember  sets  in  known  a  priori.  However, 
this  may  not  be  the  case,  and  it  should  be  learned  from  the  data.  To  address  this  challenge,  we 
proposed  an  Unsupervised  Robust  Context  Dependent  Spectral  Unmixing  (U-RCDSU)  algorithm 
that  finds  the  number  of  contexts  in  the  data  in  an  unsupervised  way.  U-RCDSU  exploits  the  fact 
that  the  possibilistic  memberships  are  not  constrained  to  sum  to  one.  U-RCDSU  starts  by  over- 
specifying  the  number  of  contexts,  then,  as  the  algorithm  iterates,  clusters  covering  the  same  dense 
regions  would  expand  and  become  similar.  These  are  ultimately  merged  and  the  number  of  clusters 
is  reduced. 

Spectral  unmixing  is  a  challenging,  ill-posed,  inverse  problem.  Many  algorithms  have  been 
proposed  for  robust,  stable,  and  accurate  unmixing  solutions.  Moreover,  different  algorithms  have 
different  modes  of  operation  and  usually  yield  different  results.  Even  the  same  algorithm  may  not 
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result  in  the  same  endmembers  when  run  multiple  times.  This  is  mainly  due  to  the  non-deterministic 
behavior  of  the  algorithm.  In  order  to  estimate  the  “optimal”  endmembers  for  a  given  data  set,  we 
proposed  taking  advantage  of  the  many  existing  unmixing  algorithms  and  exploring  their  similarity 
and  difference  using  consensus  analysis.  The  idea  has  its  roots  in  consensus  clustering.  The  proposed 
consensus  unmixing  combines  the  results  of  multiple  unmixing  algorithms,  run  multiple  times  with 
different  parameters,  to  find  consistent  endmembers  in  the  data.  The  claim  is  that  such  endmembers 
will  have  a  consensus  among  multiple  runs.  The  approach  starts  by  determining  the  set  of  points 
having  high  proportion  values  in  any  of  the  endmembers  of  the  unmixing  ensemble.  The  points 
corresponding  to  non  consistent  endmembers  are  then  filtered  out  using  a  co-association  measure, 
and  only  points  with  high  co-association  values  are  kept.  Finally,  the  remaining  points  are  clustered 
and  a  representative  from  each  cluster  is  chosen  to  be  a  consistent  endmember.  The  points  with 
high  co-association  values  have  also  been  used  to  form  a  set  of  constraints  on  the  proportions  for 
the  semi-supervised  PC-MMU  algorithm. 

Spectral  unmixing  is  used  as  an  initial  step  for  many  hyperspectral  target  detection  algo¬ 
rithms.  The  endmembers  are  used  to  model  the  background  and  compute  a  detection  statistic  for 
all  pixels.  This  is  known  as  structured  background  target  detection.  Most  of  these  detection  al¬ 
gorithms  suffer  from  two  main  problems.  First,  there  is  the  target  leakage  problem,  which  is  the 
contribution  of  the  targets  to  the  background  model.  This  is  due  to  the  presence  of  the  targets  in 
the  scene  being  modeled.  The  second  problem  is  the  background  modeling  itself.  Traditional  target 
detectors  use  global  unmixing  methods  to  model  the  background  using  a  single  set  of  endmembers. 
These  may  not  provide  a  good  description  of  the  hyperspectral  data,  especially  when  the  scene 
includes  multiple  regions  with  distinct  materials.  To  overcome  these  problems,  we  proposed  new 
context  dependent  target  detectors  that  are  based  on  the  robust  context  dependent  spectral  unmix¬ 
ing  algorithm  (RCDSU)  to  better  describe  the  background  and  minimize  target  leakage.  Targets 
can  be  though  of  as  outliers  and  hence  the  robust  unmixing  will  not  consider  them  when  modeling 
the  background.  Moreover,  the  multi-model  unmixing  provides  a  better  background  description. 
The  proposed  detectors  were  applied  to  the  traditional  Orthogonal  Subspace  Projection  detector, 
the  Adaptive  Matched  Subspace  Detector  and  the  Hybrid  Subspace  Detector  algorithms.  A  local 
detection  statistic  is  computed  for  each  context  and  then  all  scores  are  combined  using  the  fuzzy 
memberships  of  the  pixels. 

The  experimental  results  showed  good  performance  of  the  proposed  methods  on  synthetic 
and  real  data.  The  lack  of  reliable  ground  truth  information  was  a  limiting  factor  when  evaluating 
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the  proposed  algorithms  on  real  data  sets.  Qualitative  evaluation  was  employed  in  most  cases.  Data 
with  actual  targets  were  scarce.  The  University  of  Southern  Mississippi  data  was  the  only  available 
data  with  various  targets  and  various  background  materials.  The  only  drawback  was  the  occlusion 
which  affected  most  of  the  targets  present  in  the  scene. 

7.2  Potential  Future  Work 

Although  our  proposed  work  has  shown  promising  results,  there  is  still  room  for  improve¬ 
ment.  The  following  sections  list  the  various  areas  that  will  be  explored  in  the  future  to  build  upon 
the  developed  algorithms. 

7.2.1  Large  scale  evaluation 

Hyperspectral  data  is  not  very  abundant.  Only  few  public  data  sets  are  available.  Moreover, 
ground  truth  information  is  usually  not  accurate  if  not  missing.  Even  though  the  results  on  the  few 
used  data  sets  were  promising,  more  evaluation  is  still  required.  In  particular,  evaluation  using  very 
large  image  where  the  notion  of  contexts  is  more  important. 

The  National  Science  Foundation  funded  National  Ecological  Observatory  Network  (NEON1) 
will  be  one  main  source  of  data.  NEON  is  a  continental-scale  ecological  observation  system  for 
examining  critical  ecological  issues.  NEON  is  designed  to  gather  and  synthesize  data  on  the  impacts 
of  climate  change,  land  use  change,  and  invasive  species  on  natural  resources  and  biodiversity.  Data 
will  be  collected  from  106  sites  (60  terrestrial,  36  aquatic  and  10  aquatic  experimental)  across  the 
U.S.  using  instrument  measurements  and  field  sampling.  The  sites  have  been  strategically  selected 
to  represent  different  regions  of  vegetation,  landforms,  climate,  and  ecosystem  performance.  NEON 
will  combine  site-based  data  with  remotely  sensed  data  and  existing  continental-scale  data  sets  (e.g. 
satellite  data)  to  provide  a  range  of  scaled  data  products  that  can  be  used  to  describe  changes  in 
the  nations  ecosystem  through  space  and  time. 

NEON  successfully  completed  the  planning  and  design  phases  and  entered  the  construction 
phase  in  Spring  2012.  It  is  currently  building  sites.  Constructing  the  entire  NEON  network  will  take 
approximately  five  years,  so  it  is  expected  to  be  in  full  operation  by  approximately  2017.  NEON 
will  collect  data  for  30  years  and  will  have  an  open-access  approach  to  its  data  and  information 
products.  Once  this  data  is  collected  and  made  available,  it  can  be  used  to  compare  existing  multi¬ 
model  unmixing  methods. 

1  See  more  at:  http://www.ne0ninc.0rg/ab0ut/0verview#sthash.0Z8Jcudt.dpuf 
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7.2.2  Target  detection 


Subpixel  target  detection  is  a  difficult  task.  Our  experiments  showed  that  no  single  detector 
can  consistently  outperform  the  others.  The  fusion  of  multiple  detectors  is  a  potential  area  of 
research.  This  would  overcome  the  weaknesses  of  some  detectors  and  take  advantage  of  the  strength 
of  the  others.  More  importantly,  fusion  could  be  adapted  to  the  different  contexts  identified  during 
the  unmixing  process. 

Currently,  in  the  proposed  context  dependent  detectors,  only  fuzzy  memberships  are  used.  Future 
work  may  include  investigating  the  use  of  the  possibilistic  memberships  in  the  detectors,  as  opposed 
to  only  using  them  in  estimating  the  endmember  sets. 

7.2.3  Multi-sensor  fusion 

Hyperspectral  sensors  can  be  used  along  with  other  remote  sensors  such  as  LIDAR  and 
Synthetic  Aperture  Radar  (SAR)  in  order  to  take  advantage  of  the  different  information  provided 
by  each  sensor.  For  instance,  LIDAR  provides  elevation  information,  while  SAR  provides,  among 
other,  structural  information  about  the  scene.  This  would  help  directly  or  indirectly  (e.g.  generating 
supervision  constraints)  in  the  unmixing  process. 

7.2.4  Non-linear  unmixing 

The  proposed  work  has  focused  only  on  the  linear  mixing  model.  This  assumes  that  materials 
are  mainly  mixed  due  to  the  spatial  resolution  of  the  sensor,  and  that  the  intimate  mixing  of 
the  materials  on  the  ground  is  negligible.  As  a  potential  future  work,  the  proposed  multi-model 
formulation  could  be  used  to  generalize  non-linear  unmixing  methods. 
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APPENDIX  A 


Proof  of  Theorem  3.2.1 

Theorem.  The  first  and  second  order  conditions  yield  the  following  local  minimizers  of  J : 
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Proof.  To  obtain  the  optimal  endmember  set  Ej.  we  set  the  derivative  of  the  Lagrangian  L 
with  respect  to  E,  to  zero,  i.e. , 
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Solving  (A. 6)  for  Ej,  we  obtain 
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We  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative 
(3.8)  with  respect  to  Ep 
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Since  is  an  M  x  M  matrix,  we  need  to  verify  whether  it  is  positive  definite  to  make  sure  that 
the  solution  endmember  set  E.(  is  indeed  a  local  minimizer  of  J.  Let  e  be  a  non  zero  M  x  1  vector, 
we  check  if  e1  §j4e  >  0: 
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where  ||.||  denotes  the  Euclidean  norm. 
We  know  that 
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Therefore,  the  solution  endmember  set  E,  is  indeed  a  local  minimizer  of  the  objective  function  J. 

To  obtain  the  optimal  proportions  p.y ,  we  set  the  derivative  of  L  in  (3.8)  with  respect  to  Py- 
to  zero,  i.e. , 


d  L 
dp  u 


=  0, 


which  leads  to 


*7  *  j  *j 

Solving  (A.  13)  for  Py ,  we  obtain 


— 2a(u™EjXJ  -  //:;;K,K/  p^)  -  -.y  l  A/xi  -  &  =  0. 


T 

p  a  = 


l 


E»Ef  I  I E jxj  +  (nfij Imx i  +  ) 

4 J 


(A.12) 


(A.13) 


(A. 14) 


The  Karush-Kuhn- Tucker  (KKT)  conditions  [83],  namely  the  dual  feasibility  and  the  complementary 
slackness,  state  that  £y  should  be  equal  to  zero.  Finally,  we  enforce  the  non-negativity  constraint 
on  the  proportion  values  and  get 


Py  =  max 


i  — 1  r 


e8e/ 


EixT  +  -^-lMxl 
.  *  J  2 era™  Mxl 


,0  . 


(A.15) 
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If  the  proportion  values  are  clipped  at  zero,  we  do  renormalize  them  to  ensure  that  they  sum  to  one. 
Using  the  fact  that  l1XMpfj  =  1,  we  solve  for  7 y  and  obtain 


2ceu%[l  -  lixM(Ei;Ef)~1E.ixj] 
lixM(EjEf  )_11mxi 


(A.16) 


This  leads  to 


p  fj  =  max 


F  PT1  1[fxT  I  1  1lxM(E,;Ei  )  E iXj  1 

[E,E,  j  [E(x,  +  llxM(EiETr,lMxl  1  A/x  1 J . 0 


(A. 17) 


Similarly,  we  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative 


of  L  in  (3.8)  with  respect  to  Py: 


Op2.  ~~  ' 


(A.18) 


Since  4-%-  is  an  M  x  M  matrix,  we  can  verify  whether  it  is  positive  definite,  to  make  sure  that  the 

°Pij 

solution  proportions  ptJ  are  indeed  a  local  minimizer  of  J,  by  verifying  that  p  7  A^p  >  0  for  any 
non  zero  M  x  1  vector  p: 


P1  f-^P  =  2(Wi™pTE,:E(r  p 
&P  fj  3 

=  2a^(Efp)T(Efp) 
=  2a<"||Efp||2 


>  0.  (A. 19) 

Hence,  the  solution  proportions  p.y  are  indeed  a  local  minimizer  of  the  objective  function  J. 

To  obtain  the  optimal  memberships  u.j:j ,  we  set  the  derivative  of  L  in  (3.8)  with  respect  to 


Uij  to  zero,  i.e. , 


and  obtain: 


(A. 20) 


L(xj  -  Ci)(xj  -  Ci)T  +  amu™  1(xj  -  p,,E,Hxy-  -  p.yE;)T  -  X3  =  0. 


(A.21) 


Solving  (A.21)  for  Wy,  we  obtain 


i[(xj  -  c»)(xj  -  c i)T  +  a(xj  -  pyE i)(xj  -  PyEi)T] 


(A. 22) 


Using  the  fact  that  ui3  =  1,  we  solve  for  A  j,  and  obtain 

i=l 


X3  -  ^2 


-r[  L TO[(xj  -  c0(xj  -  c i)T  +  a(xj  -  PijEj)(xj  -  Py Ej)T] 


1  n  1 —m 

n-l 


(A. 23) 
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This  leads  to 


i  X ,  -  ci)(xi  -  c,  )7  +  a(xj  -  PijE i)(xj  -  p, ,  K, )T 


(A. 24) 


E  [(xj  ^  cg)(xj  -  cg)T  +  «(xj  ~  P gjEg)(x.j  -  PgjEg)T] 


Ttfc. 


9=1 


Similarly,  we  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative 
of  L  in  (3.8)  with  respect  to  Uij: 

d2L 


du2  =  m(m-l)u^  2(xj  -  Cj)(xj  -  Cj)T  +  am(m  -  l)vy  2ix,  p, , F2, )( x,  p,,K,)' 


,m—2 


=  m(m-l)u™  2\\xj  -  Ci\\2  +  am(m- l)u™  2||xj  -  p^E^A 


(A. 25) 


t- 

Since  m  >  1,  AA  is  a  positive  scalar.  Therefore,  the  solution  memberships  Uij  are  indeed  a  local 

Uij 

minimizer  of  the  objective  function  J. 

To  obtain  the  optimal  centers  c we  set  the  derivative  of  L  in  (3.8)  with  respect  to  c,  to 
zero,  i.e., 


dL 

dc. 


=  0, 


which  leads  to 


Solving  (A. 27)  for  c we  obtain 


N 


2  ^  '  Ujj  ( Xj  a)  —  0. 

3= 1 


N 

E  <yX; 

1=1 


(A. 26) 


(A. 27) 


C,  = 


N 

E«« 

j= i 


(A. 28) 


Similarly,  we  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative 
of  L  in  (3.8)  with  respect  to  c 


d2L 

dc2 


N 


1=1 


*J  ■ 


(A. 29) 


Since  >  0  for  all  i,  j,  is  a  positive  scalar.  Therefore,  the  solution  centers  c,  are  indeed  a  local 
minimizer  of  the  objective  function  J.  □ 
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APPENDIX  B 


Proof  of  Theorem  3.3.1 


Theorem.  Optimizing  the  objective  function  in  (3.13)  using  the  Lagrange  multipliers  method  leads 
to  the  same  update  equations  for  the  endmember  sets,  the  proportions,  and  the  centers  as  for  CDSU 
using  the  Euclidean  distance  (equations  (3.9),  (3.10)  and  (3.12)  respectively). 

The  update  equation  for  the  memberships,  Utj,  becomes 


rU>ii  — 


(Xj  -  c,  -  c i)T  +  a(xj  -  PijEj)(xj  -  p,,K,;' 


v  “  c 

E  (Xj  -  C9)A</(Xi  -  C«)T  +  «(XJ  -  P«E9)(XJ  -  P qjVqV 

q=  1  L 

Finally,  the  update  equation  for  the  norm  matrices,  A,;,  is 

A,  =  [jidetiCip  C~% 


(B.l) 


where 


N 


E  -  ci)  (xi  -  ci) 


c,  = 


3- 1 


N 

E 

j  =  l 


(B.2) 


(B.3) 


is  i/ie  /tizzy  covariance  matrix  of  cluster  i. 


Proof.  We  incorporate  the  constraints  in  (3.4),  (3.5),  and  (3.14)  into  the  objective  function  in  (3.13) 
using  Lagrange  multipliers  and  obtain 

C  N 

L  =  E  E  <Kxi  -  c;)A;(xj  -  a)T 

i=l  j=l 

C  r  JV 

+°'  E  E  "!>J  -  P»jEi)(xj  -  p, ,  E,  i 7  +  /3i(Mtroce(EiEf )  -  1 :  x M  E,  Ef lMxi) 

i= 1  j=l 
N  /  C 


Jv  /  O  \  o  IV  O  JV  o 

E  M  E  -  !  -  EE  7u(1ixmp5  -  1)  -  E  E  EpE  -  E  Si(det(Ai)  -  (7i),  (B.4) 

i  =  l  v  i=l  '  1=1 .7=1  i=l  7  =  1  i=l 


C  Jf  C  JV  _  c 

E  E  7y(iixMp£  - 1)  -  E  E 

j=l  '*=1  '  i—1  j=l  i=l  j—1 

where  A  =  [An,  ...,  An]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  N  constraints  on 
the  memberships  it^  in  (3.4),  T  =  pyn,  ...,  Jcn]  and  S  =  [£h,  ...,  ^cjv]  are  vectors  of  Lagrange 
multipliers  corresponding  to  the  C  x  N  constraints  on  the  proportions  p,;,  in  (3.5),  and  A=[<5i,  ..., 
5c\  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  C  constraints  on  the  norm  matrices  A, 
in  (3.14). 


114 


The  proof  of  the  first  part  of  the  theorem,  concerning  the  endmember  sets,  the  proportion 
sets  and  the  centers,  is  similar  to  the  one  in  Appendix  A. 

To  obtain  the  optimal  memberships  ul3 ,  we  set  the  derivative  of  L  in  (B.4)  with  respect  to 
to  zero,  i.e. , 


dL 

duij 


=  0, 


(B.5) 


and  obtain: 


L (xj  -  c,  :■  A ,  i  x ,  -  Ci)T  +  amu %  1(xj  -  p,  ,K,  six,  -  PijEj)T  -  X3  =  0.  (B.6) 


Solving  (B.6)  for  Ui3,  we  obtain 


_ E _ 

[m[{x3  -  a) A i(x3  -  c i)T  +  a(xj  -  p.yE i)(x3  -  p, ,  E,  )T 


(B.7) 


Using  the  fact  that  ^  Uij  =  1,  we  solve  for  A j,  and  obtain 

i=  1 


^3  ~ 


■  c 

v 

1 

1  1 

m  —  1 

_i=l 

m[(xj  -  C,;)A i(xj  -  C,ir  +  a{x.j  -  pvK,)ix,  -  pijEi)T}_ 

This  leads  to 


Un  — 


(xj  Ci)Ai (xj  a)  T  ^(xj  Piji-Eh) (xj  p^'Ej) 


l3  ~  C  _ 

E  [(x3  -  C?)A«(XJ  -  C«)T  +  «(xi  -  P«E9)(XJ  -  P?JE«)T] 

9=1 


(B.8) 


(B.9) 


We  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative  of  L  in 
(B.4)  with  respect  to  Ui3: 
d2  L 

=  m(m  —  l)u"^~2(x3  —  a)Ai(xj  —  c;)T  +  am(m  —  1  )«™_2(xj  —  |>, , E, ) ( x ,  —  p  Iv  i ' .  (B.10) 

Since  m  >  1,  is  a  positive  scalar.  Therefore,  the  solution  memberships  Ui3  are  indeed  a  local 
minimizer  of  the  objective  function  Jm- 

To  obtain  the  optimal  norm  matrices,  A;,  we  set  the  derivative  of  L  in  (B.4)  with  respect 


to  A;  to  zero,  i.e., 


and  obtain: 


dL 
dA  i 


=  0, 


N 


Yu^ixj  -  Ci)T(x3  -  a)  -  Sidet(Ai)Ai  1  =  0. 
i=i 

Solving  (B.12)  for  A;,  we  obtain 


A,;  = 


N 


n  -1 


E  "IEx./  -  c i)1  ( xj  -  c i) 

3=1 _ 

5idet(Ai ) 


(B.ll) 


(B.12) 


(B.13) 
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We  define  the  fuzzy  covariance  matrix 


E  ~  Ci)T(Xj  -  C i) 

Ci  =  — - ^ - •  (B.14) 

E«™ 

j= 1 


Using  (B.14)  gives 


_  Sjdetj Aj)  ! 

i  N  *  ' 

E«« 

j= i 


Using  the  fact  that  det(Ai)  =  ai ,  we  solve  for  and  obtain 


This  leads  to 


det(Ci) 


5,= 


N 

e  < 

j=i 


A,  =  [^^(Q^scr1. 


(B.15) 


(B.16) 


(B.17) 


Similarly,  we  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative 


of  L  in  (B.4)  with  respect  to  A,: 

d'2L 

dA2 


—Si  [det( Ai)Ai  2  -  det(Ai)Ai  2 


0. 


(B.18) 


Therefore,  the  test  is  inconclusive.  We,  hence,  move  to  a  higher  derivative  test.  We  compute  the 
third  order  derivative  of  L  in  (B.4)  with  respect  to  A,: 

Q3t 

=  -3Sidet{  Aj)A“3.  (B.19) 

According  to  (B.16),  Si  >  0.  Hence  <  0.  An  odd  derivative  order  of  negative  sign  indicates 
that  the  critical  solution  At  is  neither  local  maximum  nor  local  minimum.  It  is  instead  a  point  of 
decrease  for  the  function  Jm-  In  fact,  it  is  also  a  point  of  inflection,  though  that  is  not  of  relevance 
here.  □ 
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APPENDIX  C 


Proof  of  Theorem  3.4.1 


Theorem.  The  update  equations  for  the  endmember  sets,  the  abundances,  the  centers  and  the  norm 
matrices  are  similar  to  the  ones  of  CDSUm  (equations  (3.9),  (3.10),  (3.12)  and  (3.16)  respectively) . 
The  update  equation  for  the  memberships  becomes 

1 

d2a  +  7  costij  +  afitij 

Uij  =  V - (C-l) 

E  dlj  +  7 costqj  +  afitgj 

9=i  L  J 

where 

d2ij  =  (xi  ^  c*)Ai(xj  -  Ci)T,  (C.2) 

_  c 

costij  =  y  ]  y '  Pj^ik  +  y  '  pjkUiki  (e.3) 

(j,k)es  1=1, l^i  a,k)^M 

and 

fitij  =  (xj  -  pjjEi)(xj  -  p,  ,K,  )T.  (C.4) 

Proof.  We  incorporate  the  constraints  in  (3.4),  (3.5),  and  (3.14)  into  the  objective  function  in  (3.18) 
using  Lagrange  multipliers  and  obtain 

C  N 

i=EE  -  <V'A,:.X;  -  C  i)T 

i= 1  j= 1 

+i(  E  E  E  E  E  Pjku%u%) 

y(j,k)eSi=n=i,i^i  (j,k)eATi=i  ' 

C  r  N  1 

+Q  E  E  uij(xl  -  PijEi)(xj  -  PyEj)T  +ft(Mtrace(EjEf)  -  lixMEiEf  lMxi) 

i=l  L  j— l  -I 

N  ,  C  \  C  N  C  N  C 

-  E  Ad  E  “y  -  i  -EE  <Pij(iixMp£  - 1)  -  E  E  EpE  -  E  <E(rfe*(Ai)  -  cr4),  (c.5) 

ji — 1  x  '  *=1  j  —  1  i=l  j= 1  i=l 

where  A  =  [Ai,  ...,  Ajv]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  N  constraints  on 
the  memberships  in  (3.4),  $  =  [<p n,  ...,  ipcN ]  and  H  =  [£n,  ...,  £,cn]  are  vectors  of  Lagrange 
multipliers  corresponding  to  the  C  x  N  constraints  on  the  proportions  p.(J  in  (3.5),  and  A=[<h>  ..., 
<5c]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  C  constraints  on  the  norm  matrices  A, 
in  (3.14). 
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The  proof  of  the  first  part  of  the  theorem,  concerning  the  endmember  sets,  the  proportion 
sets  and  the  centers,  is  similar  to  the  one  in  Appendix  A.  The  proof  concerning  the  norm  matrices 
is  similar  to  the  one  in  Appendix  B. 

To  obtain  the  optimal  memberships  ut] ,  we  set  the  derivative  of  L  in  (C.5)  with  respect  to 
Uij  to  zero,  i.e. , 


dL 

duij 


=  0, 


(C.6) 


and  obtain: 


mu a 


L(xj  -ci)Ai(xi-ci)T+7 


(  E  E  mPjkU™  1ug+  E  ™pjku\ 

'  ( j,k)es  1=1, i^i  (j,k)eAf 


xik 


L(xj  -  p, , K,  Hx,  -  PijEj)T  -  A j  =  0. 


(C.7) 


We  rewrite  (C.7)  as 


mu™  ld2ij  +  7 1costij  +  amu™  1  fit jj  -  A j  =  0, 


(C.8) 


where 


dij  —  (xj  Cj)A,(xj  c i)  , 

C 

COStij  =  Pjk^ik  +  Pjku7k> 

(j,k)es  U>k)GAf 


and 


=  (Xj  -  pi:,-E i)(xj  -  p, ,  E, ) 7  . 


Solving  (C.8)  for  Uij,  we  obtain 


U{j  — 


A, 


m[df  •  +  7  costij  +  afitij \ 


Using  the  fact  that  u^j  =  1,  we  solve  for  A  j,  and  obtain 


\i  — 


’  c 

v 

1 

1  “1 

m—1 

i—1 

m[d2j  +  7 costij  +  afitij } 

This  leads  to 


dfj  +  7 costij  +  afitij 


uij=  c 


E  d2:  +  7  COStqj  +  afitq 


9=1  L 


(C.9) 

(C.10) 

(C.ll) 

(C.12) 


(C.13) 


(C.14) 


We  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative  of  L  in 
(C.5)  with  respect  to  Uif 


i(m  —  1  )u™j  2dfj  +  7 m(m  —  l)it™  2 costij  +  am(m  —  l)w™  2 fitij. 


(C.15) 


118 


r\  2  t 

Since  m  >  1,  is  a  positive  scalar.  Therefore,  the  solution  memberships  Uij  are  indeed  a  local 

Uij 

minimizer  of  the  objective  function  Jq.  □ 
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APPENDIX  D 


Proof  of  Theorem  3.4.2 


Theorem.  The  update  equations  for  the  endmemher  sets,  the  memberships,  the  centers  and  the  norm 
matrices  are  similar  to  the  ones  of  CDSUm  (equations  (3.9),  (3.15),  (3.12)  and  (3.16)  respectively) . 
The  update  equation  for  the  proportions  becomes 

pfj  =  max  I  2cm^E;Ef  +  7  E  Pj  k  j  ^auij  EjX^  +  7  ^  (  PjkPik  T  (ij  1 M  x  1  j  J  5  (E*l) 

\  0',fc)es  (j,k)es  J 


where 

1  -  lixM  2au^EjEf  +7  J]  Pjk  2au™E,xJ +7  E  PjkPjk 
c  _  (j,k)es  ( j,k)es  m 

si?  —  r  _1  ■ 

lixM  2au^EjEf  +  7  X]  Pjfc  Imxi 

L  0',fc)65  J 

Proof.  We  incorporate  the  constraints  in  (3.4),  (3.5),  and  (3.14)  into  the  objective  function  in  (3.23) 
using  Lagrange  multipliers  and  obtain 

C  N  C 

i=EE  <KXJ  -  c^A^Xj  -  ct)T  +7  X]  Pjk  E  l|p*j  -  Pifcll2 


»=ij=i 


{j,k)es  *=i 


C  r  N 

+Q-  E  E  uij  (XJ  -  PiiEi)(xi  -  P»jEi)T  +  (3i{Mtrace( E,Ef )  -  llxME,Ef  lMxi) 

i=i  Lj=i  J 

N  ,  C  \  C  N  C  N  C 

-  E  E  -  i  -  EE  ^(iixMP^  - 1)  -  E  E  Tijpfj  -  E  <E(def(A!;)  -  at),  (d.3) 

7  =  1  V  }=1  '  2=1  1  =  1  2=1  7  =  1  2  =  1 


where  A  =  [Ai,  ...,  An]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  N  constraints  on 
the  memberships  it^  in  (3.4),  H  =  [£h,  ...,  £cn]  and  =  [vih  •••)  Tcn]  are  vectors  of  Lagrange 
multipliers  corresponding  to  the  C  x  N  constraints  on  the  proportions  p ^  in  (3.5),  and  A=[<5i,  ..., 
5c]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  C  constraints  on  the  norm  matrices  A, 
in  (3.14). 

The  proof  of  the  first  part  of  the  theorem,  concerning  the  endmember  sets,  the  memberships, 
the  centers,  and  the  norm  matrices  is  similar  to  the  one  in  Appendix  B. 

To  obtain  the  optimal  proportions  ,  we  set  the  derivative  of  L  in  (D.3)  with  respect  to 
Pij  to  zero,  i.e. , 
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which  leads  to 


-2a(^EixJ-Wi”EiEfp^)+7  E  PjkijPij  Pifc)  Cij^-Mxl  —  0.  (D.5) 

U,k)es 


Solving  (D.5)  for  Py,  we  obtain 


T 

Py  = 


2n//'"E,E,/  +  7  E  pik 

(j,k)eS 


1-1 


2cru”)EixJ’  +  7  Y  PjkVfk  +  Cij1  M  x  1  +  tpij 
U,k)&S 


(D.6) 


The  Karush-Kuhn-Tucker  (KKT)  conditions  [83],  namely  the  dual  feasibility  and  the  complementary 
slackness,  state  that  <Py  should  be  equal  to  zero.  Finally,  we  enforce  the  non-negativity  constraint 
on  the  proportion  values  and  get 


pfj  =  max 


2a<™Ei:Ef  +  7  Y  Pok 

U,k)es 


-l 


2n  i/"'  E,  x  f  --  Y  PjkPfk  dfeijuxi  ,0  )  ,  (D.7) 
U,k)eS 


If  the  proportion  values  are  clipped  at  zero,  we  do  renormalize  them  to  ensure  that  they  sum  to  one. 
Using  the  fact  that  lixMpfj  =  1,  we  solve  for  £y  and  obtain 


£*?  — 


1  —  llxM 

2(w/":E.  E,'  +7  E  Pjk 

-1 

2'w/y  E,x  (  +7  E  PjkPjk 

L  U,k)  65  J 

1  (j,k)es  J 

LlxM 


2o<;e,e/  +7  E  /'//.• 


i-i 


(D.8) 


Olxl 


0,fc)65 

We  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative  of 
L  in  (D.3)  with  respect  to  Pyt 

d2L 


=  2a<E1Ef+7  Y 
(i,k)es 


Pjk- 


(D.9) 


Since  JUp  is  an  M  x  M  matrix,  we  can  verify  whether  it  is  positive  definite,  to  make  sure  that  the 
solution  proportions  Py  are  indeed  a  local  minimizer  of  Jp,  by  verifying  that  p7  P-Pfp  >  0  for  any 

°Pij 

non  zero  M  x  1  vector  p: 


PTf“uP  =  2au™pTEiEf p  +  7  Y  PjkPTP 
Pil  (. j,k)es 

=  2an™(Efp)T(E7p) +7  Y  Pjk\\p\ 

U,k)es 

=  2au™||Efpj|2 +7  Y  PjkWpf 


U,k)es 


>  0. 


(D.10) 


Hence,  the  solution  proportions  Py  are  indeed  a  local  minimizer  of  the  objective  function  Jp. 


□ 
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APPENDIX  E 


Proof  of  Theorem  3.5.1 


Theorem.  The  update  equations  for  the  proportions  and  the  memberships  are  similar  to  the  ones 
of  CDSUm  (equations  (3.10)  and  (3.15)  respectively). 

The  update  equation  for  the  endmember  sets,  E* ,  is 


N 

-1 

"  iV 

Ei  = 

/3i(MIMxM  -  1  mxm)  +  y^fauTj  + 

^2(au™  +  III )p!;Xj 

3  =  1 

.3=1 

The  update  equation  for  the  norm  matrices,  A j  ,  is  the  same  as  in  equation  (3.16),  but  with 

E  (auij  +  ~  c i)T(xi  -  c i) 

C*  =  ^ - .  (E.2) 

E  (auij  +  btij) 

3= 1 

The  update  equation  for  the  typicalities,  tij,  is  given  by 


tij  — 


1  + 


(E.3) 


where 


costij  =  [Xj  -  a) A i(xj  -  c,  I T  +  a(xj  -  p,,K,  Six,  -  p,  ,K,  S  ' 


(E.4) 


Finally ,  the  update  equation  of  the  cluster  centers  c i  is 


N 

E  {any  +  hr;/)*, 

3  =  i _ 

N 

E  +  btf-) 

3= 1 


(E.5) 


Proof.  We  incorporate  the  constraints  in  (3.4),  (3.5),  and  (3.14)  into  the  objective  function  in  (3.26) 
using  Lagrange  multipliers  and  obtain 

c  n  c  N 

i  =  EE  (auij  +  bt?j)(xj  -  Ci)A  i(xj  -  c  i)T  +  e  Vi  E  (i  -  Uj)n 

2—1 j  —  1  2—1  J  — 1 

C  r  N 

+ot  E  E  (au(i  +  btij)(xj  -  P ijEi)(xj  -  p, ,  E,  1 7  +  ft(Mtroce(EiEf )  -  lixMEjEf  lMxi) 


*=i  Lj=i 

AT  /  c 


i\  /  o  \  C  N  oiv  o 

E  A-(  E  -  !  -  EE  1 X  A/p/;  -  1)  -  E  E  Ep';  -  E  Sl(det{Al)  -  a*),  (E.6) 

7=1  vi=l  '  1=11=1  t=l .7=1  *=1 


C  JV 

E  E 

=11=1 
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where  A  =  [Ai,  Xn]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  N  constraints  on 
the  memberships  Ujj  in  (3.4),  T  =  [711,  jcn]  and  S  =  [£n,  (cjv]  are  vectors  of  Lagrange 
multipliers  corresponding  to  the  C  x  N  constraints  on  the  proportions  p 7  in  (3.5),  and  A=[<5i,  ..., 
5c]  is  a  vector  of  Lagrange  multipliers  corresponding  to  the  C  constraints  on  the  norm  matrices  A, 
in  (3.14). 

The  proof  concerning  the  proportions  is  similar  to  the  one  in  Appendix  A.  The  proof  con¬ 
cerning  the  memberships  is  similar  to  the  one  in  Appendix  B.  The  proof  concerning  the  endmember 
sets,  the  centers  and  the  norm  matrices  is  similar  to  the  one  in  Appendix  A  and  Appendix  B,  with 
the  exception  of  using  the  weighted  memberships  ( au +  bt't  )  instead  of  the  fuzzy  memberships  u'-j . 

To  obtain  the  optimal  typicalities  Uj,  we  set  the  derivative  of  L  in  (E.6)  with  respect  to  Uj 


to  zero,  i.e., 


and  obtain: 


dL 

dti 


=  0, 


(E.7) 


nbt^j  1(xj  -  ci)Ai(xj  -  Ci)T  +  ant”  1(xj  -  piiEi)(xi  -  p^E* )T  -  my^l  -  UjT  1  =  0.  (E.8) 


ij  Hy1 


Rearranging  the  terms  of  (E.8),  we  get 

,  n  n- 1 


1  -  U 


^"i) H-  ip^-j  P 


Solving  (E.9)  for  Uj ,  we  obtain 


tij  — 


1  + 


where 


COStij  (  Xy  C,  )  A  ,  (Xy  C  /  )  '  O  f  X  y  P  /j  E,  )  (Xy  P?y  E;  ) 


(E.9) 


(E.10) 


(E.ll) 


We  check  the  second-order  sufficient  condition  by  computing  the  second-order  derivative  of  L  in 
(E.6)  with  respect  to  Uj : 
d  2l 

Og-  =  n(n-l)bt™~2(xj  -Cf)Ai(xj-Ci)T+  an(n-l)  ty“2(Xj  -ppEi)(xj-pijEi)T+n(n-l)r7i(l-tij)"_2. 

(E.12) 

r\2,  T 

Since  n  >  1,  (Ay  is  a  positive  scalar.  Therefore,  the  solution  memberships  Uj  are  indeed  a  local 


minimizer  of  the  objective  function  Jr. 


□ 
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